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GLYCOSIDASE ENZYMES 
BACKGROUND OF THE INVENTION 

1 . Field of the Inventions 

This invention relates to newly identified polynucleotides, polypeptides encoded by 
such polynucleotides, the use of such polynucleotides and polypeptides, as well as the 
production and isolation of such polynucleotides and polypeptides. More particularly, the 
polynucleotides and polypeptides of the present invention has been putatively identified as 
glucosidases, a-galactosidases, P-galactosidases, B-mannosidases, fl-mannanases, 
endoglucanases, and pullalanases. 

2. Description of Related Art 

The glycosidic bond of P-galactosides can be cleaved by different classes of 
enzymes: (i) phospho-P-galactosidases (ECS .2. 1.85) are specific for a phosphorylated 
substrate generated via phosphoenolpyruvate phosphotransferase system (PTS)-dependent 
uptake; (ii) typical p-galactosidases (EC 3.2.1.23), represented by the Escherichia coli LacZ 
enzyme, which are relatively specific for P-galactosides; and (iii) p-glucosidases (EC 
3.2.1.21) such as the enzymes of Agrobacterium faecalis, Clostridium thermocellum, 
Pyrococcus furiosus or Sulfolobus solfataricus (Day, A.G. and Withers, S.G., (1986) 
Purification and characterization of a P-glucosidase from Alcaligenes faecalis. Can. J. 
Biochem. Cell. Biol. 64, 914-922; Kengen, S.W.M., et al. (1993) Eur. J. Biochem., 213, 
305-312; Ait, N., Cruezet, N. and Cattaneo, J. (1982) Properties of P-glucosidase purified 
from Clostridium thermocellum. J. Gen. Microbiol. 128, 569-577; Grogan, D.W. (1991) 
Evidence that P-galactosidase of Sulfolobus solfataricus is only one of several activities of 
a thermostable p-D-glycodiase. Appl. Environ. Microbiol. 57, 1644-1649). Members of 
the latter group, although highly specific with respect to the P-anomeric configuration of 
the glycosidic linkage, often display a rather relaxed substrate specificity and hydrolyze P- 
glucosides as well as P-fucosides and P-galactosides. 
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Generally, a-galactosidases are enzymes that catalyze the hydrolysis of galactose 
groups on a polysaccharide backbone or hydrolyze the cleavage of di- or oligosaccharides 

comprising galactose. 

Generally, B-mannanases are enzymes that catalyze the hydrolysis of mamiose 
groups internally on a polysaccharide backbone or hydrolyze the cleavage of di- or 
oligosaccaharides comprising mannose groups. B-mamiosidases hydrolyze non-reducing, 
terminal mannose residues on a mannose-containing polysaccharide and the cleavage of di- 
or oligosaccaharides comprising mannose groups. 

Guar gum is a branched galactomamian polysaccharide composed of P-1,4 linked 
mamiose backbone with a- 1 ,6 linked galactose side chains. The enzymes required for the 
degradation of guar are P-mamianase, P-mannosidase and a-galactosidase. P-mamianase 
hydrolyses the mamiose backbone internally and P-mannosidase hydrolyses non-reducing, 
terminal mamiose residues, a-galactosidase hydrolyses a-linked galactose groups. 

Galactomamian polysaccharides and the enzymes that degrade them have a variety 
of applications. Guar is commonly used as a thickening agent in food and is utilized in 
hydraulic fracturing in oil and gas recovery. Consequently, galactomamianases are 
industrially relevant for the degradation and modification of guar. Furthermore, a need 
exists for thermostable galactomannases that are active in extreme conditions associated 

with drilling and well stimulation. 

There are other applications for these enzymes in various industries, such as in the 
beet sugar industry. 20-30% of the domestic U.S. sucrose consumption is sucrose from 
sugar beets. Raw beet sugar can contain a small amount of raffinose when the sugar beets 
are stored before processing and rotting begins to set in. Raffinose inhibits the 
crystallization of sucrose and also constitutes a hidden quantity of sucrose. Thus, there is 
merit to eliminating raffinose from raw beet sugar. a-Galactosidase has also been used as 
a digestive aid to break down raffinose, stachyose, and verbascose in such foods as beans 

and other gassy foods. 

P-galactosidases which are active and stable at high temperatures appear to be 
superior enzymes for the production of lactose-free dietary milk products (Chaplin, M.F. 
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and Bucke, C. (1990) In: Enzyme Technology, pp. 159-160, Cambridge University Press, 
Cambridge, UK). Also, several studies have demonstrated the applicability of p- 
galactosidases to the en2ymatic synthesis of oligosaccharides via transglycosylation 
reactions (Nilsson, K.G.L (1988) Enzymatic synthesis of oligosaccharides. Trends 
Biotechnol. 6, 156-264; Cote, G.L. and Tao, B.Y. (1990) Oligosaccharide synthesis by 
enzymatic transglycosylation. Glycoconjugate J. 7, 145-162). Despite the commercial 
potential, only a few p-galactosidases of ihermophiles have been characterized so far. Two 
genes reported are p-galactoside-cleaving enzymes of the hyperthermophilic bacterium 
Thermotoga maritima, one of the most thermophilic organotrophic eubacteria described to 
date (Huber, R., Langworthy, T.A., Konig, H., Thomm, M., Woese, C.R., Sleytr, U.B. and 
Stetter, K.O. (1986) T. martima sp. nov, represents a new genus of unique extremely 
thermophilic eubacteria growing up to 90°C, Arch. Microbiol. 144, 324-333) one of the 
most thermophilic organotrophic eubacteria described to date. The gene products have been 
identified as a P-galactosidase and a P-glucosidase. 

Pullulanase is well known as a debranching enzyme of puUulan and starch. The 
enzyme hydrolyzes a-l,6-glucosidic linkages on these polymers. Starch degradation for 
the production or sweeteners (glucose or maltose) is a very important industrial application 
of this enz\'me. The degradation of starch is developed in two stages. The first stage 
involves the liquefaction of the substrate with a-amylase, and the second stage, or 
saccharification stage, is performed by 6-amylase with pullalanase added as a debranching 
enzyme, to obtain better yields. 

Endoglucanases can be used in a variety of industrial applications. For instance, the 
endoglucanases of the present invention can hydrolyze the internal B-l,4-glycosidic bonds 
in cellulose, which may be used for the conversion of plant biomass into fiiels and 
chemicals. Endoglucanases also have applications in detergent fomiulations, the textile 
industry, in animal feed, in waste treatment, and in the fruit juice and brewing industry for 
the clarification and extraction of juices. 
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Brief Descriprion of the Drawings 

The following drawings are illustrative of embodiments of the invention and are not 
meant to limit the scope of the invention as encompassed by the claims. 

Figures la-b are the full-length DNA and corresponding deduced amino acid 
sequence of Ml ITL of the present invention. Sequencing was performed using a 378 
automated DNA sequencer for all sequences of the present invention (Applied Biosystems, 
Inc.). 

Figure 2 is an illustration of the full-length DNA and corresponding deduced amino 
acid sequence of OC1/4V-33B/G. 

Figure 3 is an illustration of the full-length DNA and corresponding deduced amino 
acid sequence of F1-12G. 

Figures 4a-b are the full-length DNA and corresponding deduced amino acid 
sequence of9N2-31B/G. 

Figures 5a-b are the full-length DNA and corresponding deduced amino acid 
sequence of MSB8-6G. 

Figure 6 is the full-length DNA and corresponding deduced amino acid sequence 
ofAEDni2RA-18B/G. 

Figures 7a-b are the full-length DNA and corresponding deduced amino acid 
sequence of GC74-22G. 

Figures 8a-b are the full-length DNA and corresponding deduced amino acid 
sequence of VC 1 -7G 1 . 

Figures 9a-c are the full-length DNA and corresponding deduced amino acid 

sequence of 37GP 1 . 

Figures lOa-c are the full-length DNA and corresponding deduced amino acid 
sequence of 6GC2. 

Figures lla-d are the full-length DNA and corresponding deduced amino acid 
sequence of 6GP2. 

Figures 12a-c are the full-length DNA and corresponding deduced amino acid 
sequence of 63 GB 1. 
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Figures 13a-b are the flill-Iength DNA and corresponding deduced amino acid 
sequence of 0C1/4V. 

Figures 14a-e are the full-length DNA and corresponding deduced amino acid 
sequence of 6GP3. 

Figures 15a-d are the full-length DNA and corresponding deduced amino acid 

sequence of Thermotoga maritima MSB8-6GP2. 

Figures 16a-c are the full-length DNA and corresponding deduced amino acid 
sequence of Thermotoga maritima MSB8-6GB4. 

Figures 17a-d are the full-length DNA and corresponding deduced amino acid 
sequence of Banki gouldi 37GP4. 

Figures 18a-b are the full-length DNA and corresponding deduced amino acid- 
sequence of Pyrococcus furiosus VC1-7EG1. 

SUMMARY OF THE INVENTION 

In a preferred embodiment of the present invention, there are provided isolated 
nucleic acids (polynucleotides) which encode mature enzymes having the deduced amino 
acid sequences of Figures 1-18 (SEQ ID NOS: 15-28 and 61-64). 

In another embodiment, the invention provides a method for producing a 
polypeptide including culturing host cells containing the polynucleotide of Figures 1-18 and 
expressing from the host cell a polypeptide encoded by the polynucleotide and isolating the 
polypeptide. 

In another embodiment, the invention provides an enzyme selected from the group 
consisting of an enzyme having an amino acid sequence set forth in SEQ ED NOS: 15-28 
or 61 -64 and an enzyme which has at least 30 consecutive amino acid residue as an enzyme 
having an amino acid sequence set forth in SEQ ID NOS: 1 5-28 or 61-64. 

In yet another embodiment, the invention provides a method for generating glucose 
from soluble cell oligosaccharides which includes contacting a sample containing 
oligosaccharides with an effective amount of an enzyme selected from the group of 
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enzymes having the amino acid sequence set forth in SEQ ID NOS: 15-28, 61-63 and 64 

such that glucose is produced 

The publications discussed herein are provided solely for their disclosure prior to 

the filing date of the present application. Nothing herein is to be construed as an 
admission that the invention is not entitled to antedate such disclosure by virtue of prior 
invention. 



Definitions 

"Monosaccharide", as used herein, refers to a single polyhydroxy aldehyde or 
ketone unit. 

"Oligosaccharide", as used herein, consist of short chains of monosaccharide units 
joined together by covalent bonds. Of these, the most abundant are the disaccharides, 
which have two monosaccharide units. 

"Polysaccharide", as used herein, consists of long chains having many 
monosaccharide units. 

The term "gene" means the segment of DNA involved in producing a polypeptide 
chain; it includes regions preceding and following the coding region (leader and trailer) as 
well as intervening sequences (introns) between individual coding segments (exons). 

A coding sequence is "operably linked to" another coding sequence when RNA 
polymerase will transcribe the two coding sequences into a single mRNA, which is then 
translated into a single polypeptide having amino acids derived from both coding 
sequences. The coding sequences need not be contiguous to one another so long as the 
expressed sequences ultimately process to produce the desired protein. 

"Recombinant" enzymes refer to enzymes produced by recombinant DNA 
techniques; i.e. , produced from cells transformed by an exogenous DNA construct encoding 
the desired enzyme. "Synthetic" enzymes are those prepared by chemical synthesis. 

A DNA "coding sequence of or a "nucleotide sequence encoding" a particular 
enzyme, is a DNA sequence which is transcribed and translated into an enzyme when 
placed under the control of appropriate regulatory sequences. 
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Detailed Description of the Invention 

The polynucleotides and polypeptides of the present invention have been identified 
as glucosidases, a-galactosidases, p-galactosidases, B-mannosidases, (3-mannanases, 
endoglucanases, and pullalanases as a result of their enzymatic activity. 

In accordance with one aspect of the present invention, there are provided novel 
enzymes, as v/ell as active fragments, analogs and derivatives thereof. 

In accordance with another aspect of the present invention, there are provided 
isolated nucleic acid molecules encoding the enzymes of the present invention including 
mRNAs, cDNAs, genomic DNAs as well as active analogs and fragments of such enzymes. 

In accordance with yet a further aspect of the present invention, there is provided 
a process for producing such polypeptides by recombinant techniques comprising culturing 
recombinant prokaryotic and/or eukaryotic host cells, containing a nucleic acid sequence 
of the present invention, under conditions promoting expression of said enzymes and 
subsequent recovery of said enzymes. 

In accordance with yet a further aspect of the present invention, there is provided 
a process for utilizing such enzymes, or polynucleotides encoding such enzymes for 
hydrolyzing lactose to galactose and glucose for use in the food processing industry, the 
pharmaceutical industry, for example, to treat intolerance to lactose, as a diagnostic reporter 
molecule, in com wet milling, in the fruit juice industry, in baking, in the textile industry 
and in the detergent industry. 

In accordance with yet a further aspect of the present invention, there is provided 
a process for utilizing such enzymes for hydrolyzing guar gum (a galactomannan 
polysaccharide) to remove non-reducing terminal mannose residues. Further 
polysaccharides such as galactomannan and the enzymes according to the invention that 
degrade them have a variety of applications. Guar gum is commonly used as a thickening 
agent in food and also is utilized in hydraulic fracturing in oil and gas recovery. 
Consequently, mannanases are industrially relevant for the degradation and modification 
of guar gums. Furthermore, a need exists for thermostable marmases that are active in 
extreme conditions associated with drilling and well stimulation. 
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In accordance with yet a further aspect of the present invention, there are also 
provided nucleic acid probes comprising nucleic acid molecules of sufficient length to 
specifically hybridize to a nucleic acid sequence of the present invention. 

In accordance with yet a further aspect of the present invention, there is provided 
a process for utilizing such enzymes, or polynucleotides encoding such enzymes, for in 
vitro purposes related to scientific research, for example, to generate probes for identifying 
similar sequences which might encode similar enzymes from other organisms by using 
certain regions, i.e., conserved sequence regions, of the nucleotide sequence. 

These and other aspects of the present invention should be apparent to those skilled 
in the art firom the teachings herein. 

The polynucleotides of this invention were originally recovered from genomic gene 
libraries derived fi-om the following organisms: 

MUTL is a new species of Desulfurococcus isolated from Diamond Pool in 
Yellowstone National Park. The organism grows optimally at 85-88°C, pH 7.0 in a low salt 
medium containing yeast extract, peptone, and gelatin as substrates with a NXO, gas 
phase. 

0C1/4V is from the genus Thermotoga. The organism was isolated from 
Yellowstone National Park. It grows optimally at 75 °C in a low salt medium with cellulose 
as a substrate and N, in gas phase. 

Pyrococcusfuriosus VCl and (7EG1) is from the genus Pyrococcus. VCl was 

isolated from Vulcano, Italy. It grows optimally at 100°C in a high salt medium (marine) 
containing elemental sulfur, yeast extract, peptone and starch as substrates and N, in gas 
phase. 

Staphylothermiis marinus Fl is a from the genus Staphylothermus. Fl was isolated 
from Vulcano, Italy. It grows optimally at 85 X, pH 6.5 in high salt medium (marine) 
containing elemental sulfur and yeast extract as substrates and N, in gas phase. 
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Thermococctis 9N-2 is from the genus Thermococcus 9N-2 was isolated from 
diffuse vent fluid in the East Pacific Rise. It is a strict anaerobe that grows optimally at 
87°C. 

Thermotoga maritima MSB8 and MSB8 (Clone # 6GP2 and 6GB4) is from the 
genus Thermotogo, and was isolated from Vulcano, Italy. MSB8 grows optimally at 85 °C, 
pH 6.5 in a high salt medium (marine) containing starch and yeast extract as substrates and 
N2 in gas phase. 

Thermococcus alcaliphilus AEDIII2RA is from the genus Thermococcus, 
AEDII12RA grows optimally at 85 °C, pH 9.5 in a high salt medium (marine) containing 
polysulfides and yeast extract as substrates and in gas phase. 

Thermococcus chitonophagiis GC74 is from the genus Thermococcus, GC74 grows 
optimally at 85°C, pH 6.0 in a high salt medium (marine) containing chitin, meat extract, 
elemental sulfrir and yeast extract as substrates and in gas phase. AEPII la grows 
optimally at 85 °C at pH 6.5 in marine medium under anaerobic conditions. It has many 
substrates. Bankia gouldi is from the genus Bankia, 

Accordingly, the polynucleotides and en2ymes encoded thereby are identified by 
the organism from which they were isolated, and are sometimes hereinafter referred to as 
"Ml ITL" (Figure 1 and SEQ ID N0S:1 and 15), "OC1/4V-33B/G" (Figure 2 and SEQ ID 
N0S:2 and 16), ^1-120" (Figure 3 and SEQ IDN0S:3 and 17), "9N2-31B/G'* (Figure4 
and SEQ ID N0S:4 and 18), "MSB8" (Figure 5 and SEQ ID N0S:5 and 19), "AEDU12RA- 
18B/G" (Figure 6 and SEQ ID N0S:6 and 20), "GC74.22G" (Figure 7 and SEQ ID N0S:7 
and 21), "VC1-7G1" (Figure 8 and SEQ ID N0S:8 and 22), "37GP1" (Figure 9 and SEQ 
ID NOS: 9 and 23), "6GC2" (Figure 10 and SEQ ID NOS: 10 and 24), "6GP2" (Figure 1 1 
and SEQ ID N0S:11 and 25), "AEPD la" (Figure 12 and SEQ ID N0S:12 and 26), 
"0C1/4V" (Figure 13 and SEQ ID N0S:13 and 27), and "6GP3" (Figure 14 and SEQ ID 
NOS:28), "MSB8-6GP2" (Figure 15 and SEQ ID NOS:57 and 61), "MSB8-6GB4"(Figure 
16 and SEQ ID NOS:58 and 62),"VCl-7EGrXFigure 17 and SEQ ID NOS:59 and 63), 
and 37GP4 (Figure 18 and SEQ ID NOS:60 and 64). 
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The polynucleotides and polypeptides of the present invention show identity at the 
nucleotide and protein level to known genes and proteins encoded thereby as shown in 
Table 1. 

Table 1 



Clone 


Gene/Protein with 
Closest Homology 


Protein 
Identity 


Nucleic 

Acid 
Identity 


M11TL-29G 


Sulfolobus sulfataricus 

DSM1616/P1,P- 

galactosidase 


51% 


55% 


OC1/4V-33B/G 


Caldocellum 
saccharolyticum, P- 
elucosidase 


52% 


57% 


Staphylothermus 
marinus F1-12G 


Bacillus polymyxa, p- 
palactosidase 


36% 


48% 


Thermococcus 9N2- 
31B/G 


Sulfolobus sulfataricus 
ATCC 49255/MT4, P- 
eaiactosidase 


51% 


50% 


Thermotoga maritima 
MSB8-6G 


Clostridium thermocellum 
belB 


45% 


53% 


Thermococcus 
AEDni2RA.18B/G 


Bacillus polymyxa, P- 
^alactosidase 


34% 


48% 


Thermococcus 
chitonophagus GC74- 
22G 


Sulfolobus sulfataricus 
ATCC 49255/MT4, P- 
galactosidase 


46% 


54% 
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PyrococcliS furiosiis 


Sulfolobus 


46.4% 


52.5% 




VC1-7G1 


sulfataricus/MT-4 P- 
ealactosidase 








Thermotoga maritima 


Pediococcus pentosaceaus 


49% 


29% 




a-galactosidase 


a-galactosidase 






5 


(6GC2) 










Thermotoga maritima 


Aspergillus aculeatus 


56% 


37% 




B-mannanase f6GP2) 


mannanase 








AEPII laB- 


Sulfolobus solfactaricus B- 


78% 


56% 




mannosidase (63081) 


galactosidase 






10 


0C1/4V 

endoglucanase 

(33GP1) 


Clostridium thermocellum 
endo- 1 ,4-B-endogIucanase 


65% 


43% 




Thermotoga maritima 


Caldocellum 


72 


53 




puUalanase (6GP3)^ 


saccharolyticum a- 
destrom 6 
glucanohydralase 






15 


Bankia goiildi mix 

Endoglucanase 

(37GPn 


None available 







The polynucleotides and enzymes of the present invention show homology to each 
other as shown in Table 2. 



11 



wo 98/24799 



PCT/US97/22623 



Table 2 



Clone 


Gene/Protein with 
Closest Homology 


Protein 
Identity 


Nucleic 
Acid 
Identity 


Staphylothermus 
marimisY\-\2Q 


Thermococcus 
AEDII12RA-18B/G, P- 
galactosidase, elucosidase 


JJ/O 


57% 


Thermococcus 9N2- 
31B/G 


Thermococcus 
chitonophagus GC74- 
22G-glucosidase^ 


74% 


66% 


Pyrococcus fiiriosus 
VC1-7G1 


Pyrococcus furiosus VCl - 
7B/G p-ealactosidase 


46.4% 


54% 



All the clones identified in Tables 1 and 2 encode polypeptides which have a- 
glycosidase or P-glycosidase activity. 

This invention, in addition to the isolated nucleic acid molecules encoding the 
enzymes of the present invention, also provide substantially similar sequences. Isolated 
nucleic acid sequences are substantially similar if: (i) they are capable of hybridizing under 
conditions hereinafter described, to the polynucleotides of SEQ ID NOS: 1-14 and 57-60; 
(ii) or they encode DNA sequences which are degenerate to the polynucleotides of SEQ ID 
NOS: 1-14 and 57-60. Degenerate DNA sequences encode the amino acid sequences of 
SEQ ED NOS: 15-28 and 61-64, but have variations in the nucleotide coding sequences. As 
used herein, substantially similar refers to the sequences having similar identity to the 
sequences of the instant invention. The nucleotide sequences that are substantially the same 
can be identified by hybridization or by sequence comparison. Enzyme sequences that are 
substantially the same can be identified by one or more of the following: proteolytic 
digestion, gel electrophoresis and/or microsequencing. 

One means for isolating the nucleic acid molecules encoding the enzymes of the 
present invention is to probe a gene library with a natural or artificially designed probe 
using art recognized procedures (see, for example: Current Protocols in Molecular Biology, 
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Ausubel F.M. et al (EDS.) Green Publishing Company Assoc. and John Wiley Interscience, 
New York, 1989, 1992). It is appreciated to one skilled in the art that the polynucleotides 
of SEQ ID NOS: 1-14 and 57-60 or fragments thereof (comprising at least 12 contiguous 
nucleotides), are particularly useful probes. Other particular useful probes for this purpose 
are hybridizable fragments to the sequences of SEQ ID NOS: 1-14 and 57-60 (/.e., 
comprising at least 12 contiguous nucleotides). 

With respect to nucleic acid sequences which hybridize to specific nucleic acid 
sequences disclosed herein, hybridization may be carried out under conditions of reduced 
stringency, medium stringency or even stringent conditions. As an example of 
oligonucleotide hybridization, a polymer membrane containing immobilized denatured 
nucleic acids is fust prehybridized for 30 minutes at 45 °C in a solution consisting of 0.9 M 
NaCL 50 mM NaH.PO,, pH 7.0. 5.0 mM NaEDTA, 0.5% SDS, lOX Denhardfs, and 0.5 
mg/ml polyriboadenylic acid. Approximately 2X10^ cpm (specific activity 4-9 X l!) 
cpm/ug) of '-P end-labeled oligonucleotide probe are then added to the solution. After 12- 
16 hours of incubation, the membrane is washed for 30 minutes at room temperature in IX 
SET (1 50 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na.EDTA) containing 0.5% 
SDS, followed by a 30 minute wash in fresh IX SET at Tm 10°C for the oligonucleotide 
probe. The membrane is then exposed to auto-radiographic film for detection of 
hybridization signals. 

Stringent conditions means hybridization will occur only if there is at least 90% 
identity, preferably at least 95% identity and most preferably at least 97% identity between 
the sequences. Further, it is understood that a section of a 100 bps sequence that is 95 bps 
in length has 95% identity with the 1090 bps sequence from which it is obtained. See J. 
Sambrook et al. Molecular Cloning, A Laboratory Manual 2d Ed, Cold Spring Harbor 
Laboratory (1989) which is hereby incorporated by reference in its entirety. Also, it is 
understood that a fragment of a 100 bps sequence that is 95 bps in length has 95% identity 
with the 100 bps sequence from which it is obtained. 

As used herein, a first DNA (RNA) sequence is at least 70% and preferably at least 
80% identical to another DNA (RNA) sequence if there is at least 70% and preferably at 
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least a 80% or 90% identity, respectively, between the bases of the first sequence and the 
bases of the another sequence, when properly aligned with each other, for example when 
aligned by BLASTN. 

"Identity" as the term is used herein, refers to a polynucleotide sequence which 
comprises a percentage of the same bases as a reference polynucleotide (SEQ ID N0S:1-14 
and 57-60). For example, a polynucleotide which is at least 90% identical to a reference 
polynucleotide, has polynucleotide bases which are identical in 90% of the bases which 
make up the reference polynucleotide and may have different bases in 10% of the bases 
which comprise that polynucleotide sequence. 

The present invention relates polynucleotides which differ from the reference 
polynucleotide such that the changes are silent changes, for example the change do not alter 
the amino acid sequence encoded by the polynucleotide. The present invention also relates 
to nucleotide changes which result in amino acid substitutions, additions, deletions, fusions 
and truncations in the polypeptide encoded by the reference polynucleotide. In a preferred 
aspect of the invention these polypeptides retain the same biological action as the 
polypeptide encoded by the reference polynucleotide. 

It is also appreciated that such probes can be and are preferably labeled vdth an 
analytically detectable reagent to facilitate identification of the probe. Useful reagents 
include but are not limited to radioactivity, fluorescent dyes or enzymes capable of 
catalyzing the fomiation of a detectable product. The probes are thus useful to isolate 
complementary copies of DNA from other sources or to screen such sources for related 
sequences. 

The polynucleotides of this invention were recovered fi-om genomic gene libraries 
from the organisms listed in Table 1. For example, gene libraries can be generated in the 
Lambda ZAP II cloning vector (Stratagene Cloning Systems). Mass excisions can be 
performed on these libraries to generate libraries in the pBluescript phagemid. Libraries 
are thus generated and excisions performed according to the protocols/methods hereinafter 
described. 
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The excision libraries are introduced into the E. coli strain BW14893 FkanlA. 
Expression clones are then identified using a high temperature filter assay. Expression 
clones encoding several glucanases and several other glycosidases are identified and 
repurified. The polynucleotides, and enzymes encoded thereby, of the present invention, 
yield the activities as described above. 

The coding sequences for the enzymes of the present invention v^ere identified by 
screening the genomic DNAs prepared for the clones having glucosidase or galactosidase 
activity. 

An example of such an assay is a high temperature filter assay wherein expression 
clones were identified by use of high temperature filter assays using buffer Z (see recipe 
below) containing 1 mg/ml of the substrate 5-bromo-4-chloro-3-indolyl-P-D- 
glucopyranoside (XGLU) (Diagnostic Chemicals Limited or Sigma) after introducing an 
excision library into the E. coli strain BW14893 FkanlA. Expression clones encoding 
XGLUases were identified and repurified fi-om M 11 TL, OC 1 /4 V, Pyrococcus fiiriosus VC 1 , 
Staphylothemus marinus Fl, Thermococcus 9N-2, Thermotoga maritima iMSBS, 
Thermococcus aicaliphilus AEDII12RA, and Thermococcus chitonophagus GC74. 

Z-buffer: (referenced in Miller, J.H. (1992) A Short Course in Bacterial Genetics, 

p. 445.) 

per liter: 

Na.HP0,-7H,0 16.1g 
NaH.POj-yH.O 5.5g 
KCl 0.75g 
MgS04-7H20 0.246g 
P-mercaptoethanol 2.7ml 
Adjust pH to 7.0 

High Temperature Filter Assav 
( 1 ) The f factor f kan (from £ coli strain CSH 1 1 8)(1 ) was introduced into the pho-pnh- 
lac-strain BWl 4893(2). BWl 3893(2). The filamentous phage library was plated 
on the resulting strain, BW14893 F'kan. (Miller, J.H. (1992) A Short Course in 
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Bacterial Genetics; Lee, K.S., Metcalf, et al., (1992) Evidence for two phosphonate 
degradative pathways in Enterobacter Aerogenes, J. BacterioL, 174:2501-2510. 

(2) After growth on 100 mm LB plates containing 100 |ig/ml ampicillin, 80 |ig/ml 
nethicillin and ImM IPTG, colony lifts were performed using Millipore HATF 

5 membrane filters. 

(3) The colonies transferred to the filters were lysed with chloroform vapor in 150 mm 
glass petri dishes. 

(4) The filters were transferred to 100 mm glass petri dishes containing a piece of 
Whatman 3MM filter paper saturated with buffer. 

1 0 (a) when testing for galactosidase activity (XG ALase), 3 MM paper was 

saturated with Z buffer containing 1 mg/ml XGAL (ChemBridge 
Corporation). After transferring filter bearing lysed colonies to the glass 
petri dish, placed dish in oven at 80-85 °C. 

(b) v^en testing for glucosidase pCGLUase), 3 MM paper was saturated 
15 with Z buffer containing 1 mg/ml XGLU. After transferring filter bearing 

lysed colonies to the glass petri dish, placed dish in oven at 80-85 "^C. 

(5) 'Positives' were observed as blue spots on the filter membranes. Used the following 
filter rescue technique to retrieve plasmid firom lysed positive colony. Used pasteur 
pipette (or glass capillary tube) to core blue spots on the filter membrane. Placed 

20 the small filter disk in an Eppendorf tube containing 20 |il water. Incubated the 

Eppendorf tube at 75 °C for 5 minutes followed by vortexing to elute plasmid DNA 
off filter. This DNA was transformed into electrocompetent E. coli cells DHIOB 
for Thermatoga maritima MSB8-6G, Staphylothermus marinus F1-12G, 
Thermococcus AEDII12RA-18B/G, Thermococcus chitonophagus GC74-22G, 

25 Ml ITl and 0CI/4V. Electrocompetent BW14893 FkanlA £ coli were used for 

Thennococcus 9N2-3 1 B/G, and Pyrococcus furiosus VC 1 -7G L Repeated filter-lift 
assay on transformation plates to identify 'positives'. Return transformation plates 
to 37°C incubator after filter lift to regenerate colonies. Inoculate 3 ml LB liquid 
containing 100 ^xg/ml ampicillin with repurified positives and incubate at 37°C 
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overnight. Isolate plasmid DNA from these cultures and sequence plasmid insert. 
In some instances where the plates used for the initial colony lifts contained non- 
confluent colonies, a specific colony corresponding to a blue spot on the filter could 
be idennfied on a regenerated plate and repurified directly, instead of using the filter 
rescue technique. 

Another example of such an assay is a variation of the high temperature filter assay 
wherein colony-laden filters are heat-killed at different temperatures (for example, 105**C 
for 20 minutes) to monitor thermostability. The 3MM paper is saturated with different 
buffers (i.e., 1 00 mM NaCl, 5 mM MgCU, 100 mM Tris-Ci (pH 9.5)) to determine enzyme 
activity under different buffer conditions. 

A P-glucosidase assay may also be employed, wherein GlcpPNp is used as an 
artificial substrate (aryUP-glucosidase). The increase in absorbance at 405 nm as a result 
of p-nitrophenol (pNp) liberation was followed on a Hitachi U-1 100 spectrophotometer, 
equipped with a thermostatted cuvette holder. The assays may be performed at 80°C or 
90°C in closed 1-ml quartz cuvette. A standard reaction mixture contains 150 mM 
trisodium substrate, pH 5.0 (at 80 °C), and 0.95 mM pNp derivative pNp = 0.561 mM"' cm* 
'). The reaction mixture is allowed to reach the desired temperature, after which the 
reaction is started by injecting an appropriate amount of enzyme (1.06 ml final volume). 

1 U P-glucosidase activity is defined as that amount required to catalyze the 
formation of 1 .0 ^^mol pNp/min. D-cellobiose may also be used as a substrate. 

An ONPG assay for p-galactosidase activity is described by Miller, J.H. (1992) A 
Short Course in Bacterial Genetics and Mill, J.H. (1992) Experiments in Molecular 
Genetics, the contents of which are hereby incorporated by reference in their entirety. 

A quantitative Quorometric assay for p-galactosidase specific activity is described 
by : Youngman P., (1987) Plasmid Vectors for Recovering and Exploiting Tn917 
Transpositions in Bacillus and other Gram-Positive Bacteria. In Plasmids: A Practical 
approach (ed. K. Hardy) pp 79-103. IRL Press, Oxford. A description of the procedure can 
be found in Miller (1992) p. 75-77, the contents of which are incorporated by reference 
herein in their entirety. 
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The polynucleotides of the present invention may be in the form of DNA which 
DNA includes cDNA, genomic DNA, and synthetic DNA. The DNA may be double- 
stranded or single-stranded, and if single stranded may be the coding strand or non-coding 
(anti-sense) strand. The coding sequences which encodes the mature enzymes may be 
identical to the coding sequences shown in Figures 1-8 (SEQ ID NOS: 1-14 and 57-60) or 
may be a different coding sequence which coding sequence, as a result of the redundancy 
or degeneracy of the genetic code, encodes the same mature enzymes as the DNA of 
Figures 1-18 (SEQ ID NOS: 1-14 and 57-60). 

The polynucleotide which encodes for the mature enzyme of Figures 1-18 (SEQ ID 
NOS: 15-28 and 61-64) may include, but is not limited to: only the coding sequence for the 
mature enzyme; the coding sequence for the mature enzyme and additional coding sequence 
such as a leader sequence or a proprotein sequence; the coding sequence for the mature 
enzN-me (and optionally additional coding sequence) and non-coding sequence, such as 
introns or non-coding sequence 5' and/or 3' of the coding sequence for the mature enzyme. 

Thus, the term "polynucleotide encoding an enzyme (protein)" encompasses a 
polynucleotide which includes only coding sequence for the enzyme as well as a 
polynucleotide which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the hereinabove described 
polynucleotides which encode for fragments, analogs and derivatives of the enzymes having 
the deduced amino acid sequences of Figures 1-18 (SEQ ID NOS: 15-28 and 61-64). The 
variant of the polynucleotide may be a naturally occurring allelic variant of the 
polynucleotide or a non-naturally occurring variant of the polynucleotide. 

Thus, the present invention includes polynucleotides encoding the same mature 
enzymes as shown in Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) as well as variants of 
such polynucleotides which variants encode for a fragment, derivative or analog of the 
enzymes of Figures 1-18 (SEQ ID NOS: 15-28 and 61-64). Such nucleotide variants 
include deletion variants, substitution variants and addition or insertion variants. 

As hereinabove indicated, the polynucleotides may have a coding sequence which 
is a naturally occurring allelic variant of the coding sequences shown in Figures 1-1 8 (SEQ 
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ID NOS: 1-14 and 57-60). As known in the an, an allelic variant is an alternate fonm of a 
polynucleotide sequence which may have a substitution, deletion or addition of one or more 
nucleotides, which does not substantially alter the function of the encoded enzyme. 

Fragments of the full length gene of the present invention may be used as a 
hybridization probe for a cDNA or a genomic library to isolate the full length DNA and to 
isolate other DNAs which have a high sequence similarity to the gene or similar biological 
activity. Probes of this type preferably have at least 10, preferably at least 15, and even 
more preferably at least 30 bases and may contain, for example, at least 50 or more bases. 
The probe may also be used to identify a DNA clone corresponding to a full length 
transcript and a genomic clone or clones that contain the complete gene including 
regulatory and promoter regions, exons, and introns. An example of a screen comprises 
isolating the coding region of the gene by using the known DNA sequence to synthesize an 
oligonucleotide probe. Labeled oligonucleotides having a sequence complementary to that 
of the gene of the present invention are used to screen a library of genomic DNA to 
determine which members of the library the probe hybridizes to. 

The present invention further relates to polynucleotides which hybridize to the 
hereinabove-described sequences if there is at least 70%, preferably at least 90%, and more 
preferably at least 95% identity between the sequences. The present invention particularly 
relates to polynucleotides which hybridize under stringent conditions to the hereinabove- 
described polynucleotides. As herein used, the term "stringent conditions" means 
hybridization will occur only if there is at least 95% and preferably at least 97% identity 
between the sequences. The polynucleotides which hybridize to the hereinabove described 
polynucleotides in a preferred embodiment encode enzymes which either retain 
substantially the same biological function or activity as the mature enzyme encoded by the 
DNA of Figures 1-18 (SEQ ID NOS: 1-14 and 57-60). 

Alternatively, the polynucleotide may have at least 15 bases, preferably at least 30 
bases, and more preferably at least 50 bases which hybridize to any part of a polynucleotide 
of the present invention and which has an identity thereto, as hereinabove described, and 
which may or may not retain activity. For example, such polynucleotides may be employed 
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as probes for the polynucleotides of SEQ ID NOS: 1-14 and 57-60, for example, for 
recovery of the polynucleotide or as a diagnostic probe or as a PGR primer. 

Thus, the present invention is directed to polynucleotides having at least a 70% 
identity, preferably at least 90% identity and more preferably at least a 95% identity to a 
polynucleotide which encodes the enzymes of SEQ ID NOS: 15-28 and 61-64 as well as 
fragments thereof, which fragments have at least 15 bases, preferably at least 30 bases and 
most preferably at least 50 bases, which fragments are at least 90% identical, preferably at 
least 95% identical and most preferably at least 97% identical under stringent conditions 
to any portion of a polynucleotide of the present invention. 

The present invention further relates to enzymes which have the deduced amino acid 
sequences of Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) as well as fragments, analogs 

and derivatives of such enzyme. 

The terms "fragment," "derivative" and "analog" when referring to the enzymes of 
Figures 1-18 (SEQ ID NOS: 15-28 and 61-64) means enzymes which retain essentially the 
same biological function or activity as such enzymes. Thus, an analog includes a proprotein 
which can be activated by cleavage of the proprotein portion to produce an active mature 
enzyme. 

The enzymes of the present invention may be a recombinant enzyme, a natural 
enzyme or a synthetic enzyme, preferably a recombinant enzyme. 

The fragment, derivative or analog of the enzymes of Figures 1-1 8 (SEQ ID NOS: 
15-28 and 61-64) may be (i) one in which one or more of the amino acid residues are 
substituted with a conserved or non-conserved amino acid residue (preferably a conserved 
amino acid residue) and such substituted amino acid residue may or may not be one 
encoded by the genetic code, or (ii) one in which one or more of the amino acid residues 
includes a substituem group, or (iii) one in which the mature enzyme is fused with another 
compound, such as a compound to increase the half-life of the enzyme (for example, 
polyethylene glycol), or (iv) one in which the additional amino acids are fused to the mature 
enzyme, such as a leader or secretory sequence or a sequence which is employed for 
purification of the mature enzyme or a proprotein sequence. Such fragments, derivatives 
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and analogs are deemed to be within the scope of those skilled in the art from the teachings 
herein. 

The enzymes and polynucleotides of the present invention are preferably provided 
in an isolated form, and preferably are purified to homogeneity. 

The term "isolated" means that the material is removed from its original 
environment (e.g., the natural environment if it is naturally occurring). For example, a 
naturally-occurring polynucleotide or enzyme present in a living animal is not isolated, but 
the same polynucleotide or enzyme, separated from some or all of the coexisting materials 
in the natural system, is isolated. Such polynucleotides could be part of a vector and/or 
such polynucleotides or enzymes could be part of a composition, and still be isolated in that 
such vector or composition is not part of its natural environment. 

The enzymes of the present invention include the enzymes of SEQ ID NOS: 15-28 
and 61-64 (in particular the mature enzyme) as well as enzymes which have at least 70% 
similarity (preferably at least 70% identity) to the enzymes of SEQ ID NOS: 15-28 and 61- 
64 and more preferably at least 90% similarity (more preferably at least 90% identity) to 
the enzymes of SEQ ID NOS: 15-28 and 61-64 and still more preferably at least 95% 
similarity (still more preferably at least 95% identity) to the enzymes of SEQ ID NOS: 15- 
28 and 61-64 and also include portions of such enzymes with such portion of the enzyme 
generally containing at least 30 amino acids and more preferably at least 50 amino acids. 

As known in the art "similarity" between two enzymes is determined by comparing 
the amino acid sequence and its conserved amino acid substitutes of one enzyme to the 
sequence of a second enzyme. 

A variant, i.e. a "fragment", "analog" or "derivative" polypeptide, and reference 
polypeptide may differ in amino acid sequence by one or more substitutions, additions, 
deletions, fusions and truncations, which may be present in any combination. 

Among preferred variants are those that vary from a reference by conservative 
amino acid substitutions. Such substitutions are those that substitute a given amino acid in 
a polypeptide by another amino acid of like characteristics. Typically seen as conservative 
substitutions are the replacements, one for another, among the aliphatic amino acids Ala, 
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Val, Leu and He; interchange of the hydroxyl residues Ser and Thr, exchange of the acidic 
residues Asp and Glu, substitution between the amide residues Asn and Gin, exchange of 
the basic residues Lys and Arg and replacements among the aromatic residues Phe, Tyr. 

Most highly preferred are variants which retain the same biological function and 
activity as the reference polypeptide from which it varies. 

Fragments or portions of the enzymes of the present invention may be employed for 
producing the corresponding full-length enzyme by peptide synthesis; therefore, the 
fragments may be employed as intermediates for producing the full-length enzymes. 
Fragments or portions of the polynucleotides of the present invention may be used to 
synthesize full-length polynucleotides of the present invention. 

The present invention also relates to vectors which include polynucleotides of the 
present invention, host cells which are genetically engineered with vectors of the invention 
and the production of enzymes of the invention by recombinant techniques. 

Host cells are genetically engineered (transduced or transformed or transfected) with 
the vectors of this invention which may be, for example, a cloning vector or an expression 
vector. The vector may be, for example, in the form of a plasmid, a viral particle, a phage, 
etc. The engineered host cells can be cultured in conventional nutrient media modified as 
appropriate for activating promoters, selecting transformants or amplifying the genes of the 
present invention. The culture conditions, such as temperature, pH and the like, are those 
previously used with the host cell selected for expression, and will be apparent to the 
ordinarily skilled artisan. 

The polynucleotides of the present invention may be employed for producing 
enzymes by recombinant techniques. Thus, for example, the polynucleotide may be 
included in any one of a variety of expression vectors for expressing an enzyme. Such 
vectors include chromosomal, nonchromosomal and synthetic DNA sequences, e.g., 
derivatives of SV40; bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors 
derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, 
adenovirus, fowl pox virus, and pseudorabies. However, any other vector may be used as 
long as it is replicable and viable in tiie host. 
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The appropriate DNA sequence may be inserted into the vector by a variety of 
procedures. In general, the DNA sequence is inserted into an appropriate restriction 
endonuclease site(s) by procedures known in the art. Such procedures and others are 
deemed to be within the scope of those skilled in the art. 

The DNA sequence in the expression vector is operatively linked to an appropriate 
expression control sequence(s) (promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or SV40 promoter, the E. coli. 
lac or trp, the phage lambda Pl promoter and other promoters knovm to control expression 
of genes in prokaryotic or eukaryotic cells or their viruses. The expression vector also 
contains a ribosome binding site for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences for amplifying expression. 

In addition, the expression vectors preferably contain one or more selectable marker 
genes to provide a phenotypic trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance in E. coli . 

The vector containing the appropriate DNA sequence as hereinabove described, as 
well as an appropriate promoter or control sequence, may be employed to transform an 
appropriate host to permit the host to express the protein. 

As representative examples of appropriate hosts, there may be mentioned: bacterial 
cells, such as E. coli , Streotomvces , Bacillus subtilis; ftingal cells, such as yeast; insect cells 
such as Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS or Bowes 
melanoma; adenoviruses; plant cells, etc. The selection of an appropriate host is deemed 
to be within the scope of those skilled in the art from the teachings herein. 

More particularly, the present invention also includes recombinant constructs 
comprising one or more of the sequences as broadly described above. The constructs 
comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention 
has been inserted, in a forward or reverse orientation. In a preferred aspect of this 
embodiment, the construct further comprises regulatory sequences, including, for example, 
a promoter, operably linked to the sequence. Large numbers of suitable vectors and 
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promoters are known to those of skill in- the art. and are commercially available. The 
following vectors are provided by way of example; Bacterial: pQE70, pQE60, pQE-9 
(QiagenX pDlO, psiX174, pBluescript II KS, pNH8A, pNH16a, pNHlSA, pNH46A 
(Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); Eukaiyotic: 
PSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia). 
However, any other plasmid or vector may be used as long as they are replicable and viable 
in the host. 

Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. Two 
appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include 
lacL lacZ, T3, T7, gpt, lambda Pr, P^ and trp. Eukaryotic promoters include CMV 
immediate early, HSV thymidine kinase, early and late SV40, LTRs firom retrovirus, and 
mouse metallothionein-I. Selection of the appropriate vector and promoter is well within 
the level of ordinary skill in the art. 

In a further embodiment, the present invention relates to host cells containing the 
above-described constructs. The host cell can be a higher eukaryotic cell, such as a 
mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a 
prokaryotic cell, such as a bacterial cell. Introduction of the construct into the host cell can 
be effected by calcium phosphate transfection, DEAE-Dextran mediated transfection, or 
electroporation (Davis, L., Dibner, M., Battey, I., Basic Methods in Molecular Biology, 
(1986)). 

The constructs in host cells can be used in a conventional manner to produce the 
gene product encoded by the recombinant sequence. Alternatively, the enzymes of the 
invention can be synthetically produced by conventional peptide synthesizers. 

Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells 
under the control of appropriate promoters. Cell-free translation systems can also be 
employed to produce such proteins using RNAs derived from the DNA constructs of the 
present invention. Appropriate cloning and expression vectors for use with prokaryotic and 
eukaryotic hosts are described by Sambrook, et al.. Molecular Cloning: A Laboratory 
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Manual. Second Edition, Cold Spring Harbor, N.Y., (1989), the disclosure of which is 
hereby incorporated by reference. 

Transcription of the DNA encoding the enzymes of the present invention by higher 
eukaryotes is increased by inserting an enhancer sequence into the vector. Enhancers are 
cis-acting elements of DNA, usually about from 10 to 300 bp that act on a promoter to 
increase its transcription. Examples include the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus eariy promoter enhancer, the polyoma 
enhancer on the late side of the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers pemiitting transformation of the host cell, e.g., the ampicillin resistance 
gene of E. coli and S. cerevisiae TRPl gene, and a promoter derived from a highly- 
expressed gene to direct transcription of a downstream structural sequence. Such promoters 
can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate 
kinase (PGK), a-factor, acid phosphatase, or heat shock proteins, among others. The 
heterologous structural sequence is assembled in appropriate phase with translation 
initiation and termination sequences, and preferably, a leader sequence capable of directing 
secretion of translated enzyme. Optionally, the heterologous sequence can encode a ftision 
enzyme including an N-terminal identification peptide imparting desired characteristics, 
e.g., stabilization or simplified purification of expressed recombinant product. 

Useful expression vectors for bacterial use are constructed by inserting a structural 
DNA sequence encoding a desired protein together with suitable translation initiation and 
teraaination signals in operable reading phase with a functional promoter. The vector will 
comprise one or more phenotypic selectable markers and an origin of replication to ensure 
maintenance of the vector and to, if desirable, provide amplification within the host. 
Suitable prokaryotic hosts for transformation include E. coli . Bacillus subtilis. Salmonella 
tvphimurium and various species within the genera Pseudomonas, Streptomyces, and 
Staphylococcus, although others may also be employed as a matter of choice. 

As a representative but nonlimiting example, useful expression vectors for bacterial 
use can comprise a selectable marker and bacterial origin of replication derived from 
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commercially available plasmids comprising genetic elements of the well known cloning 
vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 
(Phannacia Fine Chemicals, Uppsala. Sweden) and GEMl (Promega Biotec, Madison, WI, 
USA). These pBR322 "backbone" sections are combined with an appropriate promoter and 
the structural sequence to be expressed. 

Following transformation of a suitable host strain and growth of the host strain to 
an appropriate cell density, the selected promoter is induced by appropriate means (e.g., 
temperature shift or chemical induction) and cells are cultured for an additional period. 

Cells are typically harvested by centrifugation, disrupted by physical or chemical 
means, and the resulting crude extract retained for fiirther purification. 

Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or 
use of cell lysing agents, such methods are well known to those skilled in the art. 

Various mammalian cell culture systems can also be employed to express 
recombinant protein. Examples of mammalian expression systems include the COS-7 lines 
of monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 (1981), and other cell 
lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa 
and BHK cell lines. Mammalian expression vectors will comprise an origin of replication, 
a suitable promoter and enhancer, and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, 
and 5- flanking nontranscribed sequences. DNA sequences derived from the SV40 splice, 
and polyadenylation sites may be used to provide the required nontranscribed genetic 
elements. 

The enzyme can be recovered and purified from recombinant cell cultures by 
methods including ammonium sulfate or ethanol precipitation, acid extraction, anion or 
cation exchange chromatography, phosphocellulose chromatography, hydrophobic 
interaction chromatography, affinity chromatography, hydroxylapatite chromatography and 
lectin chromatography. Protein refolding steps can be used, as necessary, in completing 
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configuration of the mature protein. Finally, high performance Uquid chromatography 
(HPLC) can be employed for final purification steps. 

The enzymes of the present invention may be a naturally purified product, or a 
product of chemical synthetic procedures, or produced by recombinant techniques from a 
prokaryotic or eukaryotic host (for example, by bacterial, yeast, higher plant, insect and 
mammalian cells in culture). Depending upon the host employed in a recombinant 
production procedure, the enzymes of the present invention may be glycosylated or may be 
non-glycosylated. Enzymes of the invention may or may not also include an initial 
methionine amino acid residue. 

p-galactosidase hydrolyzes lactose to galactose and glucose. Accordingly, the 
0C1/4V, 9N2-31B/a AEDII12RA-18B/G and FI-12G enzymes may be employed in the 
food processing industry for the production of low lactose content milk and for the 
production of galactose or glucose from lactose contained in whey obtained in a large 
amount as a by-product in die production of cheese. Generally, it is desired that enzymes 
used in food processing, such as the aforementioned P-galactosidases, be stable at elevated 
temperatures to help prevent microbial contamination. 

These enzymes may also be employed in the pharmaceutical industry. The enzymes 
are used to treat intolerance to lactose. In this case, a themiostable enzyme is desired, as 
well. Thermostable P-galactosidases also have uses in diagnostic applications, where they 
are employed as reporter molecules. 

Glucosidases act on soluble cellooligosaccharides firom the non-reducing end to give 
glucose as the sole product. Glucanases (endo- and exo-) act in the depolymerization of 
cellulose, generating more non-reducing ends (endo-glucanases, for instance, act on internal 
linkages yielding cellobiose, glucose and cellooligosaccharides as products). P- 
glucosidases are used in applications where glucose is the desired product. Accordingly, 
MllTL, F1-12G, GC74-22G, MSB8-6G , 0CI/4V, VC1-7G1, 9N2-31B/G and 
AEDII12RA18B/G may be employed in a wide variety of industrial applications, including 
in com wet milling for the separation of starch and gluten, in the fruit industry for 
clarification and equipment maintenance, in baking for viscosity reduction, in the textile 
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industry for the processing of blue jeans, and in the detergent industry as an additive. For 
these and other applications, thermostable enzymes are desirable. 

Antibodies generated against the enzymes corresponding to a sequence of the 
present invention can be obtained by direct injection of the enzymes into an animal or by 
administering the enzymes to an animal, preferably a nonhuman. The antibody so obtained 
will then bind the enzymes itself. In this manner, even a sequence encoding only a 
fragment of the enzymes can be used to generate antibodies binding the whole native 
enzymes. Such antibodies can then be used to isolate the enzyme from cells expressing that 
enzyme. 

For preparation of monoclonal antibodies, any technique which provides antibodies 
produced by continuous cell line cultures can be used. Examples include the hybridoma 
technique (Kohler and Milstein, 1975, Nature, 256:495-497), the trioma technique, the 
human B-cell hybridoma technique (Kozbor et al., 1 983, Immunology Today 4:72), and the 
EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al., 1985, m 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. Patent 
4,946,778) can be adapted to produce single chain antibodies to immunogenic enzyme 
products of this invention. Also, transgenic mice may be used to express humanized 
antibodies to immunogenic enzyme products of this invention. 

Antibodies generated against the enzyme of the present invention may be used in 
screening for similar enzymes from other organisms and samples. Such screening 
techniques are known in the art, for example, one such screening assay is described in 
"Methods for Measuring Cellulase Activities", Methods in enzymology, Vol 160, pp. 87- 
1 16, which is hereby incorporated by reference in its entirety. 

The present invention will be further described with reference to the following 
examples; however, it is to be understood that the present invention is not limited to such 
examples. All parts or amounts, unless otherwise specified, are by weight. 

In order to facilitate understanding of the following examples certain frequently 
occurring methods and/or terms will be described. 
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"Plasmids" are designated by a lower case p preceded and/or followed by capital 
letters and/or numbers. The starting plasmids herein are either commercially available, 
publicly available on an unrestricted basis, or can be constructed from available plasmids 
in accord with published procedures. In addition, equivalent plasmids to those described 
are known in the art and will be apparent to the ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the DNA with a restriction 
enzyme that acts only at certain sequences in the DNA. The various restriction enzymes 
used herein are commercially available and their reaction conditions, cofactors and other 
requirements were used as would be known to the ordinarily skilled artisan. For analytical 
purposes, typically 1 |ig of plasmid or DNA fragment is used with about 2 units of enzyme 
in about 20 |il of buffer solution. For the purpose of isolating DNA fragments for plasmid 
construction, typically 5 to 50 ^g of DNA are digested with 20 to 250 units of enzyme in 
a larger volume. Appropriate buffers and substrate amounts for particular restriction 
enzymes are specified by the manufacturer. Incubation times of about 1 hour at 3TC are 
ordinarily used, but may vary in accordance with the supplier's instructions. After digestion 
the reaction is electrophoresed directly on a polyacrylamide get to isolate the desired 
fragment. 

Size separadon of the cleaved fragments is performed using 8 percent 
polyacrylamide gel described by Goeddel. D. et ai. Nucleic Acids Res., 8:4057 (1980). 

"Oligonucleotides" refers to either a single stranded polydeoxynucleotide or two 
complementary polydeoxynucleotide strands which may be chemically synthesized- Such 
synthetic oligonucleotides have no 5' phosphate and thus will not ligate to another 
oligonucleotide without adding a phosphate with an ATP in the presence of a kinase. A 
synthetic oligonucleotide will ligate to a fragment that has not been dephosphorylated. 

"Ligation" refers to the process of forming phosphodiester bonds between two 
double stranded nucleic acid fragments (Maniatis, T., et al., Id., p. 146). Unless otherwise 
provided, ligation may be accomplished using known buffers and conditions with 1 0 units 
of T4 DNA ligase ("ligase") per 0.5 |ig of approximately equimolar amounts of the DNA 
fragments to be ligated. 
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Unless otherwise stated, transfonnation was performed as described in the method 
of Graham, F. and Van der Eb, A., Virology, 52:456-457 (1973). 

Example 1 

Rflf terial ExprPssinn and P iirificfltion of Clyf nsidase Enzvmes 
DNA encoding the enzymes of the present invention, SEQ ID NOS: 1-14 and 57-60 
were initially amplified from a pBluescript vector containing the DNA by the PGR 
technique using the primers noted herein. The amplified sequences were then inserted into 
the respective PQE vector listed beneath the primer sequences, and the enzyme was 
expressed according to the protocols set forth herein. The 5' and 3' primer sequences for 
the respective genes are as follows: 

Thermococcus AEDII12RA -18B/G 

5- CCGAGAATrCATrAAAGAGGAGAAArrAACTATGGTOAATGCTATGAITGTC 3' (SEQ ID NO:29) 
3- CGGAAGATCTTCATAGCTCCGGAAGCCCATA 5' (SEQ ID N0:30) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' Big 
U. 



OC1/4V-33B/G 

5- CCGAGAATTCATrAAAGAGGAGAAAITAACTATGATAAGAAGGTCCGATnTCC 3' 
(SEQIDNO:31) 

3- CGGAAGATCnTAAGA-rnTAGAAA1TCCTT5-(SEQ IDNO:32) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' Bgl 
II. 



Thermococcus 9N2 - 3IB/G 

5' CCGAGAATTCATTAAAGAGGAGAAATTAACTATGCTACCAGAAGGCTTTCTC 3' 
(SEQIDNO:33) 

3' CGGAGGTACCTCACCCAAGTCCGAACTTCTC 5' (SEQ ID NO:34) 

Vector: pQE30; and contains the following restriction enzyme sites 5' EcoRI 
KpnI. 
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Staphylothermus marinusYX - 12G 

5' CCGAGAATTCATTAAAGAGGAGAAATrA\ACTATCATAAGGTTTCCTGATTAT 3* 
(SEQ ID NO;35) 

3* CGGAAGATCrTTATTCCAGGrrcnTAATCC 5' (SEQ ID NO:36) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' Bgl 
II. 

Thermococcits chitonophagus GC74 - 22G 

5' CCGAGAATTCATTCArrAAAGAGGAGAAATTAACTATGCTTCCAGGAGAACTT^ 3' 
(SEQ ID NO:37) 

3' CGGAGGATCCCTACCCCTCCTCTAAGATCTC 5* (SEQ ID NO:38) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRl and 3' 
BamHI. 

MllTL 

5- AATAATCTAGAGCATGCAATTCCCCAAAGACTTCATGATAG 3' (SEQ ID NO;39) 
3' AATAAAAGCTTACTGGATCAGTGTAAGATGCT 5' (SEQ ID NO:40) 

Vector: pQE70; and contains the following restriction enzyme sites 5' SphI and 3* Hind 

m. 

Thermotoga maritima MSB8-6G 

5' CCGACAATTGATTAAAGAGGAGAAATTAACTATGGAAAGGATCGATGAAATT 3' (SEQ ID N0:41) 
3* CGGAGGTACCTCATGGTTTGAATCTCTTCTC 5' (SEQ ID NO:42) 

Vector: pQEI2; and contains the following restriction enzyme sites 5' EcoRl and 3' 
KpnI. 

PyrococcusfuriosusV CI -7G1 

5* CCGACAArrGATTAAAGAGGAGAAATTAACTATGTTCCCTGAAAAGTTCCTT 3' (SEQ ID NO:43) 
3' CGGAGGTACCTCATCCCCTCAGCAATTCCrC 5* (SEQ ID NO:44) 

Vector: pQE12; and contains the following restriction enzyme sites 5' EcoRI and 3' Kpn 
L 
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Bankia gouldi endoglucanase (37GP1) 

5' AATAAGGATCCCnTAGCGACGCTCGC 3' (SEQ ID NO:45) 

3' AATAAAAGCTTCCGGGTTGTACAGCGGTAATAGGC 3' (SEQ ID NO:46) 

Vector: pQE52; and contains the following restriction enzyme sites 5' Bam 
Hind m. 



Thermotoga maritima a-galactosidase (6GC2) 

5- nTATTGAAITCArrAAAGAGGAGAAAITAACTATGATCTGTGTGGAAATAITCGGAAAG 3' 
(SEQ IDNO:47) 

3- TCTATAAAGCTTTCAITCTCTCTCACCCTCTrCCTAGAAG 5' (SEQ ID NO:48) 

Vector: pQET; and contains the following restriction enzyme sites 5' EcoRI a 

m. 



Thermotoga maritima fi-mannanase (6GP2) 

5- -nTATrCAAITGAITAAAGAGGAGAAAITAACTATGGGGATTGGTGGCGACGAC 3' 
(SEQ ID NO:49) 

3' •nTATTAAGCTTATCTnTCATATTCACATACCTCC 5' (SEQ ID NO:50) 

Vector: pQEt; and contains the following restriction enzyme sites 5' Hind HI ai 
EcoRI. 

AEPIIla B-mannanase(63GBl) 

5' -nTATTGAATrCAITAAAGAGGAGAAAITAACTATGCTACCAGAAGAGTrCCTATGGGGC 3' 
(SEQIDN0:51) 

3' nTATrAAGCrrCTCATCAACGGCTATGGTCTTCATrTC 3' {SEQ ID NO:32) 

Vector: pQEt; and contains the following restriction enzyme sites 5' Hind III a 
EcoRI. 



0C1/4V endoglucanase (33GP1) 

5'AAAAAACAATTGAATTCATrAAAGAGGAGAAATrAACTATGGTAGAAAGACACTTCAGATATGTT( 
3' (SEQ ID NO:53) 

3' rrnrCGGATCCAATTCTTCATTrACTCTTTGCCTG 5' (SEQ ID N0:5-l) 
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Vector: pQEt; and contains the following restriction enzyme sites 5' BamHI and 3' 
EcoRI. 

Thermotoga maritima puUalanase (6GP3) 

5' TTTTGGAATTCATT.Vu^GAGGAGAAATTAACTATGGAACTGATC ATAGAAGGTTAC 3* 
(SEQ IDNO:55) 

3' ATAAGAAGCTTTTCACTCTCTGTACAGAACGTACGC 5' (SEQ ID NO:56) 

Vector: pQEt; and contains the following restriction enzyme sites 5' EcoRI and 3' Hind 

in. 

The restriction enzyme sites indicated correspond to the restriction enzyme sites on 
the bacterial expression vector indicated for the respective gene (Qiagen, Inc. Chatsworth, 
CA). The pQE vector encodes antibiotic resistance (Amp"), a bacterial origin of replication 
(on), an IPTG-regulatable promoter operator (P/0), a ribosome binding site (RBS), a 6-His 
tag and restriction enzyme sites. 

The pQE vector was digested with the restriction enzymes indicated. The amplified 
sequences were ligated into the respective pQE vector and inserted in frame with the 
sequence encoding for the RBS. The ligation mixture was then used to transform the E. coli 
strain M15/pREP4 (Qiagen, Inc.) by electroporation. M15/pREP4 contains multiple copies 
of the plasmid pREP4, which expresses the lad repressor and also confers kanamycin 
resistance (Kan^. Transformants were identified by their ability to grow on LB plates and 
ampicillin/kanamycin resistant colonies were selected. Plasmid DNA was isolated and 
confirmed by restriction analysis. Clones containing the desired constructs were grown 
overnight (0/N) in liquid culture in LB media supplemented with both Amp (100 ug/ml) 
and Kan (25 ug/ml). The 0/N culture was used to inoculate a large culture at a ratio of 
1:100 to 1:250. The cells were grown to an optical density 600 (O.D.^°°) of between 0.4 and 
0.6. IPTG ("Isopropyl-B-D-thiogalacto pyranoside") was then added to a final 
concentration of 1 mM. IPTG induces by inactivating the lad repressor, clearing the P/0 
leading to increased gene expression. Cells were grovw an extra 3 to 4 hours. Cells were 
then harvested by centrifugation. 
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The primer sequences set out above may also be employed to isolate the target gene 
from the deposited material by hybridization techniques described above. 

Example 2 

l.nlatinn of A SelecM Tlnne From the Dennsited genomic clones 

A clone is isolated directly by screening the deposited material using the 
oligonucleotide primers set forth in Example 1 for the particular gene desired to be 
isolated. The specific oligonucleotides are synthesized using an Applied Biosystems 
DNA synthesizer. The oligonucleotides are labeled with '-P- -ATP using T4 
polynucleotide kinase and purified according to a standard protocol (Maniatis et al., 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, Cold Spring. NY, 
1982). The deposited clones in the pBluescript vectors may be employed to transform 
bacterial hosts which are then plated on 1 .5% agar plates to the density of 20,000- 
50,000 pfu/150 mm plate. These plates are screened using Nylon membranes according 
to the standard screening protocol (Stratagene, 1993). Specifically, the Nylon 
membrane with denatured and fixed DNA is prehybridized in 6 x SSC, 20 mM 
NaH^POj, 0.4%SDS, 5 x Denhardt's 500 ng/ml denatured, sonicated salmon speim 
DNA; and 6 x SSC, 0.1% SDS. After one hour of prehybridization, the membrane is 
hybridized with hybridization buffer 6xSSC. 20 mM NaH,PO,. 0.4%SDS, 500 ug/ml 
denatured, sonicated salmon sperm DNA with 1x10* cpm/ml "P-probe overnight at 
42°C. The membrane is washed at 45-50°C with washing buffer 6 x SSC, 0.1% SDS 
for 20-30 minutes dried and exposed to Kodak X-ray film overnight. Positive clones are 
isolated and purified by secondary and tertiary screening. The purified clone is 
sequenced to verify its identity to the primer sequence. 

Once the clone is isolated, the two oligonucleotide primers corresponding to the 
gene of imerest are used to amplify the gene from the deposited material. A polymerase 
chain reaction is carried out in 25 ^l of reaction mixture with 0.5 ug of the DNA of the 
gene of interest. The reaction mixture is 1.5-5 mM MgCU, 0.01% (w/v) gelatin, 20 ^M 
each of dATP, dCTP, dGTP, dTTP, 25 pmol of each primer and 0.25 Unit of Taq 
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polymerase. Thirty five cycles of PGR (denaturation at 94°C for 1 min; annealing at 
55 °C for 1 min; elongation at 72 °C for 1 min) are performed with the Perkin-Elmer 
Cetus automated thermal cycler. The amplified product is analyzed by agarose gel 
electrophoresis and the DNA band with expected molecular weight is excised and 
purified. The PGR product is verified to be the gene of interest by subcloning and 
sequencing the DNA product. The ends of the newly purified genes are nucleotide 
sequenced to identify full length sequences. Gomplete sequencing of full length genes is 
then performed by Exonuclease III digestion or primer walking. 

Example 3 
Screening for Galactosidase Activity 
Screening procedures for a-galactosidase protein activity may be assayed for as 
follows: 

Substrate plates were provided by a standard plating procedure. Dilute XLl- 
Blue MRF E coli host of (Stratagene Cloning Systems, La Jolla, GA) to O-D.^oo = 1 .0 
with NZY media. In 15 ml tubes, inoculate 200 lA diluted host cells with phage. Mix 
gently and incubate tubes at 37 °G for 15 min. Add approximately 3.5 ml LB top 
agarose (0.7%) containing ImM IPTG to each tube and pour onto all NYZ plate surface. 
Allow to cool and incubate at 37 °G ovemight. The assay plates are obtained as 
substrate p-Nitrophenyl a-galactosidase (Sigma) (200 mg/100 ml) (100 mM NaGl, 100 
mM Potassium-Phosphate) 1% (w/v) agarose. The plaques are overlayed with 
nitrocellulose and incubated at 4 ^'C for 30 minutes whereupon the nitrocellulose is 
removed and overlayed onto the substrate plates. The substrate plates are then incubated 
at 70 °G for 20 minutes. 
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Example 4 

S<>ri»pnin} > of Clones fnr Map tmnnse Activity 

A solid phase screening assay was utilized as a primary screening method to test 

clones for B-mannanase activity. 

A culture solution of the Y1090-£. coli host strain (Stratagene Cloning Systems, 
La Jolla, CA) was diluted to O.D.,oo=10 withNZY media. The amplified library from 
Thermotoga maritima lambda gtU library was diluted in SM (phage dilution buffer): 5 
X 10' pfii/Ml diluted 1 :1000 then 1 :100 to 5 x 10^ pfu/^1. Then 8 ^l of phage dilution 
(5x10^ pfW^l) was plated in 200 nl host cells. They were then incubated in 15 ml 

tubes at 37 "C for 15 minutes. 

Approximately 4 ml of molten. LB top agarose (0.7%) at approximately 52 °C 
was added to each tube and the mixture was poured onto the surface of LB agar plates. 
The agar plates were then incubated at 37 =C for five hours. The plates were replicated 
and induced with 10 mM IPTG-soaked Duralon-UV™ nylon membranes (Stratagene 
Cloning Systems, La Jolla, C A) overnight. The nylon membranes and plates were 
marked with a needle to keep their orientation and the nylon membranes were then 

removed and stored at 4 °C. 

An Azo-galactomannan overlay was applied to the LB plates containing the 
lambda plaques. The overiay contains 1% agarose. 50 mM potassium-phosphate buffer 
pH7,0.4%Azocarob.gaIactomannan. (Megazyme, Australia). The plates were 
incubated at 72 °C. The Azocarob-galactomannan treated plates were observed after 4 
hours then returned to incubation overnight. Putative positives were identified by 
clearing zones on the Azocarob-galactomannan plates. Two positive clones were 
observed. 

The nylon membranes referred to above, which correspond to the positive clones 
were retrieved, oriented over the plate and the portions matching the locations of the 
clearing zones for positive clones wre cut out. Phage was eluted from the membrane 
cut-out portions by soaking the individual portions in 500 nl SM (phage dilution buffer) 
and 25 ^il CHCI3. 
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Example 5 

Screening of Clones for Mannosidase Activity 

A solid phase screening assay was utilized as. a primary screening method to test 
clones for B-mannosidase activity. 

A culture solution of the Y1090-£. coli host strain (Stratagene Cloning Systems, 
La Jolla, CA) was diluted to O.D.^oo^l-O with NZY media. The amplified library from 
AEPII la lambda gtll library was diluted in SM (phage dilution buffer): 5x10' pfu/^1 
diluted 1 : 1000 then 1 :100 to 5 x 10" pfti/nl. Then 8 ^1 of phage dilution 
(5x10- pfu/|il) was plated in 200 \xl host cells. They were then incubated in 15 ml 
tubes at 37 °C for 15 minutes. 

Approximately 4 ml of molten, LB top agarose (0.7%) at approximately 52 °C 
was added to each tube and the mixture was poured onto the surface of LB agar plates. 
The agar plates were then incubated at 37 °C for five hours. The plates were replicated 
and induced with 10 mM IPTG-soaked Duralon-UV™ nylon membranes (Stratagene 
Cloning Systems, La Jolla, CA) overnight. The nylon membranes and plates were 
marked with a needle to keep their orientation and the nylon membranes were then 
removed and stored at 4 °C. 

A p-nitrophenyl-B-D-manno-pyranoside overlay was applied to the LB plates 
containing the lambda plaques. The overiay contains 1% agarose, 50 mM potassium- 
phosphate buffer pH 7, 0.4% p-nitrophenyl-B-D-manno-pyranoside. (Megazyme, 
Australia). The plates were incubated at 72 °C. The p-nitrophenyl-B-D-manno- 
pyranoside treated plates were observed after 4 hours then returned to incubation 
overnight. Putative positives were identified by clearing zones on the p-nitrophenyl-B- 
D-manno-pyranoside plates. Two positive clones were observed. 

The nylon membranes referred to above, which correspond to the positive clones 
were retrieved, oriented over the plate and the portions matching the locations of the 
clearing zones for positive clones wre cut out. Phage was eluted from the membrane 
cut-out portions by soaking the individual portions in 500 |il SM (phage dilution buffer) 
and 25 ^1 CHCI3. 
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Example 6 
Screening for Pullulanase Activity 

Screening procedures for pullulanase protein activity may be assayed for as 
follows: 

Substrate plates were provided by a standard plating procedure. Host cells are 
diluted to O.D.^oo = 1 -0 with NZY or appropriate media. In 1 5 ml tubes, inoculate 200 
Ail diluted host cells with phage. Mix gently and incubate tubes at 37 °C for 15 min. 
Add approximately 3.5 ml LB top agarose (0.7%) is added to each tube and the mixture 
is plated, allowed to cool, and incubated at 37 °C for about 28 hours. Overlays of 4.5 
mis of the following substrate are poured: 

100 ml total volume 



0,5g Red PuUulan Red (Megazyme, Australia) 

l.Og Agarose 

5ml Buffer (Tris-HCL pH 7.2 @ 75 °C) 

2ml SMNaCl 

5ml CaCU(lOOmM) 

85ml dH20 



Plates are cooled at room temperature, and thenm incubated at 75°C for 2 hours. 
Positives are observed as showing substrate degradation. 

Example 7 
Screening for Endoglucanase Activity 

Screening procedures for endoglucanase protein activity may be assayed for as 
follows: 

1. The gene library is plated onto 6 LB/GelRite/O.1% CMC/NZY agar plates 
(-4,800 plaque forming units/plate) in E.coli host with LB agarose as top agarose. The 
plates are incubated at 37°C overnight. 
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2. Plates are chilled at 4°C for one hour. 

3. The plates are overlayed with Duralon membranes (Stratagene) at room 
temperature for one hour and the membranes are oriented and lifted off the plates and 
stored at 4 °C. 

4. The top agarose layer is removed and plates are incubated at 37°C for -3 

hours. 

5. The plate surface is rinsed with NaCl. 

6. The plate is stained with 0.1% Congo Red for 15 minutes. 

7. The plate is destained with 1 M NaCl. 

8. The putative positives identified on plate are isolated from the Duralon 
membrane (positives are identified by clearing zones around clones). The phage is 
eluted from the membrane by incubating in 500|il SM + 25^1 CHCI3 to elute. 

9. Insert DNA is subcloned into any appropriate cloning vector and 
subclones are reassayed for CMCase activity using the following protocol: 

i) Spin 1ml overnight miniprep of clone at maximum speed for 3 

minutes. 

ii) Decant the supernatant and use it to fill "wells" that have been 
made in an LB/GelRite/0.1% CMC plate. 

iii) Incubate at 37°C for 2 hours. 

iv) Stain with 0. 1 % Congo Red for 1 5 minutes. 

v) Destain with 1 M NaCl for 1 5 minutes. 

vi) Identify positives by clearing zone around clone. 

Numerous modifications and variations of the present invention are possible in 
light of the above teachings and, therefore, within the scope of the appended claims, the 
invention may be practiced otherwise than as particularly described. 
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WHAT IS CLAIMED IS : 

1 . An isolated polynucleotide selected from the group consisting of: 

(a) SEQIDNOS: 1-14 and 57-60; 

(b) SEQ ID NOS: 1-14 and 57-60, wherein T can also be U; 

(c) polynucleotide sequences complementary to SEQ ID NOS: 1-14 and 57- 
60; 

(d) polynucleotide sequences which encode an amino acid sequence as set 
forth in SEQ ID NOS:15-28, and 61-64; and 

(e) fragments of (a), (b), (c) or (d) that are at least 1 5 consecutive bases in 
length and that will selectively hybridize to DNA which encodes a 
polypeptide of SEQ ID NOS: 15-28, and 61-64. 

2. A vector comprising a polynucleotide of claim 1 . 

3. A host cell containing the vector of claim 2. 

4. The method of claim 3, wherein the host cell is a eukaryotic cell. 

5. The method of claim 3, wherein the host cell is a prokaryotic cell. 

6 . A method for producing a polypeptide comprising: 

(a) culturing the host cells of claim 3 ; 

(b) expressing from the host cell of claim 3 a polypeptide encoded by said 
polynucleotide; and 

(c) isolating the polypeptide. 
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7. An enzyme selected from the group consisting of: 

(a) an enzyme comprising an amino acid sequence set forth in SEQ ID NOS: 
15-28 or 61-64; and 

(b) an enzyme which comprises at least 30 consecutive amino acid residue as 
an enzyme of (a). 

8. An enzyme of which at least a portion is coded for by a polynucleotide of 
claim 1, and which is selected from the group consisting of: 

(a) an enzyme comprising an amino acid sequence which is at least 70% 
identical to an amino acid sequence selected from the group of amino 
acid sequences set forth in SEQ ID NOS:15-28 or 61-64; and 

(b) an enzyme which comprises at least 30 amino acid residues to the 
enzyme of (a). 



9. A method for generating glucose from soluble cell oligosaccharides comprising 
contacting a sample containing oligosaccharides with an effective amount of an 
enyzme selected from the group consisting of an enzyme having the amino acid 
sequence set forth in SEQ ID NOS: 15-28, 61-63 and 64 such that glucose is 
produced. 

10. The method of cliam 9, wherein the sample is selected from the group consisting 
of dairy products, fruit juices, detergents, textiles, guar gum, animal feed, plant 

biomass and waste products. 

11. The method of claim 9, wherein the oligosaccharide is selected from the group 
consisting of maltose, cellobiose, lactose, sucrose, raffmose, stachyose, 
verbascose, cellulose, starch, amylose, glycogen, disacharrides, polysacharrides 
and pullulan. 
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STAPRYL0THERMU3 KARIKUS GLYCOSIDASE - 12C 
COMPLETE GENE SEQUENCE 
9/95 
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i2()r GAc; gaC taC aTa aaa aac; ato a(;a (jaa aca cag gaa tat aaa rcc a<;a Af*r cac ht tcc i2«i 

401 Clo Glu Tyr He Ly* Ly» Mci Arp Clu Thf Clu Ctu Tyr l.y\ pf,. Afp fhf Asp Set Trp 4t(( 

I2M CCA ACC CTC ATA AAA CCC AAA CTC CCA CAG AAT TTC CTC TCA GAA AAA CAC ATA AAC AAA (J^o 

411 Cly Tlu Viil lie Lyi Pri» Lys Leu Pro Clu Am Phe Leu Set Glu Lys Clu lie Ly:^ Lys 440 

IJ21 CCT CCA AAC AAA AAC GAT CU CCA CTT CTT CTC ATC AC7T ACC ATC TCC CCT CAC CCA TAC 13/10 

441 Pmi Prn Ly» Ly* A»n Aip Vul AIm V«| Vit V»| lie Set Afj lie Scr Cly Glu Cly Tyr 4A0 

1341 CAC ACA AAC CCC CTC AAA CCT CaC TTC TAC CTC TCC CAT CAC CAC CTC CAA CTC ATA AAA 1440 

461 Ajtp Art Lya Pro V*l Ly« Cly Asp Phc Tyr L«i Scr Afp A,ip Clu Uu Clu Leu tic Ly» 4M 

1441 ACC CTC TCC AAA CAA TTC CAC CAT CAG CCT AAC AAA CTT CTG CTT CTT CTC AAC ATC CCA tJOO 

481 Tilt V«l Set Lyi Clu Phe Hts Ajp Gin C)y Lys Lyi Vkl Vil Vtl Uu Uu Ain Uc Cly 300 

IMI ACT CCC ATC CAA CTC CCA ACC TCC ACA ' CAC CTT CTC CAT CCA ATT CTT CTC CTC TCC CAC 1360 

50] Ser Pro Ik Clu Vtl Ala Scr Trp Art Asp Uu Vtl Asp Cly fie Uu Uu Vil Trp Cln 520 

1561 CCC CCA CAG CAC ATC CCA ACA ATA CTC CCC CAT CTT CTT CTG CCA AAC ATT AAT CCC TCC 1620 

321 AU Cly Cln Glu Met Cly Art He Vtl AU Ajp Vtl Leu Vtl Cly Lyi tie Am Pro Ser 540 

1621 CCA AAA CTT CCA ACO ACC TTC CCQ AAG CAT TAC TCC CAC CTT CCA TCC TCC ACC TTC CCA 1680 

541 Cly Lys Uu Pro Thr Thr Phe Pro Lyi Aip Tyr Scr Asp Vtl Pro Ser Trp Thr Phc Pro 560 

1681 CCA CAC CCA AAG CAC AAT CCC CAA ACA CTG CTC TaC CAC CAA CAC ATC TAC CTC CCA TAC 1740 

561. Cly Clu Pro Lyi Aip ASA Pro Cln Arj Vil Vij Tyr Clu CJu Aip lie Tyr Vil Cly , Tyr 580 

174 1 ACC TAC TAC CAC ACC TTC CCT CTG CAA CCT CCC TAC CAA TTC CCC TAC CCC CTC TCT TAC 1800 

581 Aft Tyr Tyr A*p T>»r Phe Cly Vtl Clu Pro Alt Tyr Cltt Phc Ciy Tyr Cly Uu Scr Tyr 600 

1801 ACA AAG TTT CAA TAC AAA CAT TTA AAA ATC CCT ATC CAC CCT CAC ACC CTC ACA CTC TCC 1860 

601 Thr Ly> Phc Clu Tyr Lys Aap Uu Lyi tie AU lie Ajp Cly Glu Thr Uu Art Vtl Scr 620 

1861 TAC ACC ATC ACA AAC ACT COG CAC ACA CCT CCA AAC CAA CTC TCA CAC CTC TAC ATC AAA 1920 

621 Tyr Thr lie Thr Am Thr Cly Axp Art C'y Lyi Clu Vtl Ser Cln V»| Tyr Ue Lyi . 640 

1921 CCT CCA AAA CCA AAA ATA CAC AAA CCC TTC CAC CAC CTC AAA CCC TTT CAC AAA ACA AAA 1980 

641 AU Pro Lyi Cly Lyi He Aip Lyi Pro Phe Cln Clu Uu Lyi Alt Phe His Lyi Tlir Ly$ 660 

1981 CTT TTC AAC CCC CCT GAA TCA CAA CAA ATC TCC TTC CAA ATT CCT CTC ACA CAT CTT CCC 2040 

661 Uu Uu All) Pro Cly Clu Ser Clu Clu Uc Ser Uu CIh Itc Prn Uu Art Aip Uu Alt 680 

2041 ACT TTC CAT CCC AAA CAA TCC CTT CTC CAC TCA CCA CAA TAC GAG CTC ACC CTC CCT CCA 2100 

681 Ser Phc Asp Gly Lyi Clu Trp Vil Vtl Clo Ser Cly Clu Tyr Clu V.l Art Vtl Cly Alt 700 

2101 TCT TCC ACC CAT ATA ACC TTC ACA CAT ATT TTT CTC CTT CAC CCA CAG AAC ACA TTC AAA 2160 

701 Ser Ser Art Aip lie Art Lea Art Aip lie Phe Uu Vnl Clu Cly Clu Lys Arj Phe Lyi 720 

2161 CCA TCA 2166 
721 Pra End 722 
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THERWOCOCCDS CHITONOPHACOS CLYCOSIDXSE - 23C 
COMPLETE SEQUENCE - 9/95 

I 7J! ^ p'''' Trr CTC TCC CCA CTT TCA CAC TCC CCA TTC CAC TTT CAA ATC CUC 

X Net Leu Pro Clu A*n Phe Leu Trp Cly v*l Ser Cln Ser Cly Phe Cin Phe Clu Net Cl>. 

21 A,p Arg L.u Arfl Arg Hi, U. A3p Pro A,n Thr A.p Trp Trp Tyr Trp Val Arg Asp Clu 

I^*^ ^^'^ ^T^ ^ ^ ^^"^ CAA CAC CCT ATA AAT TCA TAT 

41 Tyr A.n 11. Lys Ly. Cly L.u V.l Ser Cly A.p Uu Pro Clu A.p Cly He Asl lit l^r 

181 CAA TTA TAT CAC ACA CAC CAA CAA ATT CCA AAC CAT TTA CCC CTC AAC ACA TAT ACC ATr 
61 Clu Leu Tyr Clu Ary Asp Cln Clu He Ala Ly. Asp L.u Cly ^^n ^'r ^1 Arg He 

^li r7 ^ CTA rrr cca tcc cca acc act ttt ctc gac ctc cac tat caa 

81 Cly lie Clu Trp S.r Arg V.l Phe Pro Trp Pro Thr Thr Ph. Val Asp Val «u ^ 

ll\ tH !^ CCG TO CTA AAC CAT CTC AAC ATT TCT AAA CAC CCA TTA CAA AAA 

lO: n. ASP Ciu ,S.r Tyr Cly L.u V^l Ly. Asp Val Ly. II. S.r Ly. Asp Ala L.u Clu 

3 61 CTT CAT CAA ATC CCT AAC CAA ACC CAA ATA ATA TAT TAT ACC AAC CTA ATA AAT TCC CTA 
121 L.u Asp Clu II. AU Asn Cln Arg Clu II. tl. Tyr Tyr Arg A.n lIu iIC Hr 

Kl Arg Ly, Arg Cly Ph. Ly. V»l II. L.u Asn Leu Asn Hi. Phe Thr L.u Pro 11. Trp L.u 

III ^ CAA AAA CCC CTC ACC AAT AAC ACA AAC CCA TCC CTA ACC 

161 Hi, A»p Pro II. Clu S«r Arg Clu Ly. Ala Leu Thr Asn Ly. Arg Asn Cly Trp Val Ser 

551 CAA ACC ACT CTT ATA CAC TTT CCA AAA TTT CCC CCC TAT TTA CCA TAT AAA TTC CCA CAC 
181 Clu Arg S.r V.l II. clu Ph. Ala Ly, Ph. Ala Ala Tyr L.u Ala Tyr Lys Ph. Cly Asp 

II. ™ ATC TCC ACC ACA TTT AAT CAA CCT ATC CTC CTC CCC GAG TTC COG TAT TTA 

201 II. Val Asp M.e Trp Ser Thr Ph. Asn Clu Pro Met Val V.l Ala Clu Uiu Cly Tyr Leu 

661 CCC CCA TAC TCA CCA TTC CCC CCC CCA CTC ATC AAT CCA CAA CCA CCA AAC TTA CTT ATC 7 
221 Ala Pro Tyr Ser Cly Ph. Pro Pro Cly Val Met Asn Pro Clu Ala Ala Lys L«u Val Met 2 

721 CTA CAT ATC ATA AAC CCC CAT CCT TTA CCA TAT ACC ATC ATA AAC AAA TTT CAC ACA AAA 
241 L.U His H«t II. Asn Ala Hi. Ala Leu Ala Tyr Arg M.c II. Lys Lys Phe Asp Arg Ly. 



60 
20 

120 
40 

160 
60 

240 
BO 

300 
100 

360 
120 

420 
140 

480 

160 

540 

180 

600 
200 



660 
220 



720 



780 
260 



840 
280 



7 81 AAA CCT CAT CCA CAA TCA AAA CAA CCA CCT CAA ATA CCA ATT ATA TAC AAT AAC ATC CCC 
261 Ly. Ala Asp Pro Clu S.r Ly, Clu Pro Ala Clu II. Cly II. He Tyr Asn Asn II. Cly 

841 CTC ACA TAT CCC TTT AAT CCC AAA CAC TCA AAC CAT CTA CAA CCA TCC CAT AAT CCC AAT 
281 Val Thr Tyr Pro Ph. Asn Pro Ly. A.p Ser Ly. Asp L«u Cln Ala S.r Asp Asn Ala Asn 

901 rrC TTC CAC ACT CCC CTA TTC TTA ACC CCT ATC CAC ACC CCA AAA TTA AAT ATC CAA TTT 
301 Ph. Ph. His Ser Cly Leu Phe Leu Thr Ala He His Arg Cly Ly. Leu Asn He Clu Phe 

961 CAC CCA GAG ACA TTT CTT TAC CTT CCA TAT TTA AAC CCC AAT GAT TCC CTC CCA CTC AAT 
321 Asp Cly Clu Thr Phe Val Tyr Leu Pro Tyr Leu Ly, Cly Asn Asp Trp Leu Cly Val Asn 

1021 TAT TAT ACA ACA CAA CTC err AAA TAC CAA CAT CCC ATC TTT CCA ACT ATC CCT CTC ATA 
341 Tyr Tyr Thr Arg Clu Val Val Ly, Tyr Cln Asp Pro Met Phe Pro Ser He Pro Leu He 

1081 ACC TTC AAC CCC CTT CCA CAT TAT CCA TAC CCA TCT ACA CCA CCA ACC ACC TCA AAC CAC 114 0 
361 Ser Phe Lys Cly Val Pro Asp Tyr Cly Tyr Cly Cy, Arg Pro Cly Thr Thr Ser Ly, Asp 390 

1141 CCT AAT CCT CTT ACT CAC ATT CCA TCC CAC CTA TAT CCC AAA CCC ATC TAC CAC TCT ATA 
381 Gly Asn Pro Val Ser Asp He Cly Trp Clu Val Tyr Pro Ly, Cly Met Tyr Asp Ser He 

1201 CTA CCT GCC AAT CAA TAT CCA CTT CCT CTA TAC CTA ACA CAA AAC CCA ATA CCA CAT TCA 126 
401 Val Ala Ala Asn Clu Tyr Cly Val Pro Val Tyr Val Thr Clu Asn Cly He Ala Asp Ser 420 

1261 AAA CAT CTA TTA ACC CCC TAT TAC ATC CCA TCT CAC ATT CAA CCC ATC CAA CAC CCT TAC 13 20 
421 Ly« Asp Val Leu Arg Pro Tyr Tyr He Ala Ser His He Clu Ala Met Clu Clu Ala Tyr 440 

Figure ICL 
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Figure 7b (Continued) 
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Figure 8a. 
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1261 AAC 
421 Lyj 

1321 CAC 
441 CIu 

1381 CCT 
461 Al« 

14 41 ATT 
481 ZU 

1501 AAA 
501 Lya 



GAC ATC 
Aap Il« 

CAT CCC 
Aap Ciy 

CTC CCC 
Leu Gly 

CCC ACG 

Pro Arg 

AAG ATT 
lya He 



CTA ACA 

TAT GAA 

Tyr clu 

TTT AGA 
Phe Arg 
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Ciu Lya 
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Clu GJu 



CCT TAC 

Pro Tyr 

CTT AAC 
V«l Lys 
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Mot Axg 
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5^ H^^ np i""^ '^^^ =^ TTt 

Ser Hi3 lie Lya Mec lie Ciu Lya Ala Phe 

Stf T« ^ TCC 

Hi 3 Trp AU Leu Thx Aap Aan Phe Clu Trp 
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ACA GAG ATA CTA CCC AAT AAT CCT CTT ACG 
^9 Clu He V*l Al* Aan A.n Gly vVl ?hr 



1320 
440 

1380 
460 

1440 
480 

ISOO 
•500 



TGA 
tnd 



1533 
511 



figure 8b(Continued) 



PCTAUS97/22623 

WO 98/24799 ^^^^^ 



Bankla. gouldi •Ddofflvcuaaa (37071) 

' ^« . 27 36 45 54 

5 • ATG ACA ATA CGT m GCG AOS CTC GCO CTC ttX: CCA CCC CTG AGC CCA CTC ACC 
Met Arg lie Axb Leu Ala Thr Leu Ala Leu cya Ala Ala L«u Sor Pro Val Thr 

72 Bl 90 99 108 

TTT CCA GAT AAT GTA ACC CTA CAA ATC GAC GCC GAC CCC GGT AAA AAA CTC ATC 
?h« Aid Aap Aan Val Thr V<kl Ola He Aap Al* Aap Cly Gly Lya Lye L«u lie 

^^"^ "6 135 144 153 

ACC CGA GCC err TAC GCC ATC AAT AAC TCC AAC CCA CAA ASC CPT ACC GAT ACT 
Scr Arg Ala Leu Tyr Gly Mec Asa Asn Ser A*n Ala Glu Scr Leu Thr Aap Thr 

171 180 189 198 207 21S 

GACTGGCAGCCTTTTCCCGATGCACOTGTGCGCATGCTOCGGGAAAATGGCCCC 
A«p Trp Gla Arg Phc Arg Aap Ala Cly Val Arg Hot Leu Arg Glu Aaa Gly Cly 

"5 234 243 252 261 270 

AAC AAC AGC ACC AAA TAT AAC TCC CAA CTC CAC CTG AGC AGT CAT CCC GAT TOG 
Aan A«n Scr. Thr Lys Tyr Asa Trp cia Leu Hia L=u Ser Scr Hia Pro As^ Ttd 

379 288 297 306 315 324 

TAC AAC AAT CTC TAC CCC CCC AAC AAC AAC TGG GAC AAC CCC GTA GCC CTG ATT 
Tyr Asn Aan Val Tyr Ala Gly Aan Aan Asn Trp A«p Aan Ary Val Ala Usu Ila 

333 342 351 360 3fi9 378 

CAGGAAAACCTGCCCCGCCCCGACACCATGTCCCCATTCCACCTCATCGCTAAC 
Gin Glu Aaa Leu Pro Cly Ala Asp Thr Vet Trp Ala Phe GIb Leu He Cly Lya 

3fi7 396 405 414 423 432 

GTC GOT OCT ACT TCT GCC TAC AAC TIT AAC GAT TTO 

Val Ala Ala Thr Ser Ala Tyr Asa Ur, Aap Trp Glu Phe Asn Gla Ser Cln 

450 459 468 477 436 

TOGTCCACCGCCOTCCCTCAGAATCTCCCTCGCCCCOOTGAACCCAATCTOGAC 
Trp Trp Thr Cly Val Ala Gla Aan Leu Ala Gly Gly Cly Glu Pro Aaa Leu Asp 

513 522 531 540 

GGC GGC GGC GAA QCG CTG GTT GAA GGA GAC CCC AAT CTC TAC CTC ATG GAT 7CG 
Cly Cly Cly Glu Ala Leu Val Clu Cly Asp Pro Aaa Leu Tyr Leu Met Aap Trp 

558 567 576 585 594 

TCGCCAQCCGACACTGTOGQTATrCTCGACCACTGCrrPGCCCTAAACCCCCrC 
Ser Pro Ala Asp Tte Val Cly He Leu Asp His Trp Phe Gly Val Asa Gly Lou 

"3 612 621 630 639 64B 

CCC CTC CCC CGT CCC AAA GCC AAA TAC TOO AGT ATQ GAT AAC CAG CCC CCC ATC 
Gly Val Arg Arg Gly Lys Ala Lys Tyr Trp Ser Met Asp Asn Glu Pro Gly He 

«57 666 675 684 593 702 

^ If ^ *^ '^'^ ^ ^ ^ GTA GAA GAT TTC 

Trp Val Gly Thr His Asp Asp Val Val Lys Clu Gin Thr Pro Val Glu Asp Phe 



Figure 9cl 
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^ '^^^ 730 729 -Jia 

CTG CAC ACC TAT TTC GJU ACC GCP *ii 756 
774 701 

AAA ATC ACC CCT CCC CTC CCC OCT AAT GAC Zf! 810 

n. ^ ,„ ^. s 0^ 5s s s 0=5 z 

"' •4< 



TTC TCC CTA CCC CAC GAA CXX GCfS ^ . 864 

p.. s., s; s: s s 1'? s ^ ^^ir.B 

'"^^ «82 891 

CM ffiCG TCT CAA {y«} cau CM crjL *Ar ™- 91B 

^ V. ^ s; s s s s SI s s s 

CTO CXC TAC TAC CCC CGC OCT TAC XA^ rtno 972 

... ^ r„ S s i;^ s s; ?^ ST 2 s 

. ACO TOC TO GXC CGC cue TIT CTP TM f "=6 

- ... ». SI ST 12 2: s ^ s^ 

=1. «p ^ ^ J- iif ^ - ™ g c« 

"9" 1107 111* 

CAT TGG CTC «G CAA TAT ATG CCA CAC CAT^ CTA ACC «^ 

ASP ^ s ;s S s S ^ 

GAA ATO^JJ? OTO COC^lH CTO aat"S AM xcJUn "«8 

119' 130C 1215 

ATS CTC COC ACC TIC CCC GAT AAC CCC CTC cxx lll ^ . 

s sj s; s s s s :s s 

125X 1350 
AACACCGGAATGTCOGJUACACTCCACCTr^ -rrrlVi 

coo CTC^^c TCC AGC^JJJ ACT CTT^^^l CAfi "^0 

v.. s.. s.. 3„ ,„ s; IS ^ s! 5s s s s Ji: 



"«8 1377 1386 



Figure 9b(Continued) 
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BAiilela gottldi •&409l«ouiA«« (370P1) (coatlnu»d) 

X413 143:2 1431 1440 1449 1458 

ACC CAC ACT CKTC ACT ore GCT ATC GAC CAT TTC CCA CTG GAT CGC CCC TAC CCC 
Thr His Thr Al4 Thr V«l Ala lie Aap Aap Pb« Pro Lqu Acp Gly Pro Tyr Arg 

1^*'' 1476 14fi5 1494 1503 1512 

ACC CTC CGC TTA CAC AAC CTG ceo GGG GAG GAA ACC TTC CTA TCT CAC CGA GAC 
Thr L«u Arg Leu Hifl Aaa Leu Pro Gly Glu Glu Thr Phe VaI Ser His Arg Aap 

1521 1530 1539 1548 1557 1566 

AAC GCC CTO GAA AAA OCT ACA GTC CGC GCC AGC GAC AAT ACG GTA ACA CTC CAC 
Afia Ala L«u Glu Lys Gly Thx V*l Arg AIa Sor Acp Asn Thr Val Thr Leu Glu 

1575 1584 1593 1602 1611 

TTG CCC CCT CTG TCC GTT ACT GCA ATA ITG CTC AAG GCC CGG CCC TAA 3 ' 
Leu Pro Pro Leu S«r Val Thr Ala Ila Leu L«u Ly» Ala Arg Pro 



Figure 9fl. (Continued) 
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II. cy« v«x ciu r,c p.,c ci. "i: 'i; ^ ^ - ~- 

Ly. Clu Ly, xxn ph. «u- v.l «u Ph, Al. VaT cIu Z;^' ill 'hLI li^^ 

Ly« lu ser Gly Ary v*i oi; ^ gI; i^^; zIu ^ig cii j;;; i;; 

^ § ?^ !!? !!! ^ ^ ^ ^ eca ^ I^* 

Lys Ala Pro Clu Ly, v»l vll I^" ^ cl^ ^ ^ ~ 

^ 325 234 243 9*;^ *i 

— ^ !!!!!!!!! ^ ^ car ccc T« Aa tI^ 

vai v^l Ala Ph. Phe ^ ciu lie ii^ j;;^ 

*~ ?f! !!! ?^ ^ «^ nc CM A« <« ^ 

Ala Ser v*i v*i vii £;) z;;! ciil ^ 

^ ^ 5?^ § 333 ^ AAA ll§ CCA cw ^ 

Val Ala Glu Glu Gly Lys Val ^ Gly Phi ^ ^ 111 Ms ^ 

3!^™!f!?!??^???!ff?^^f^f3^ffA^i^Cn:OASTTTC«T^ 
Phe Phe Ala Val clu Asp cly vH '^l ^ ^ ^ vll 
<41 450 455 46fi A'rn 

Cl" Phc Aap Asp Phe VaJ Pro i^u clu iro Ziu Cil vll Glu 7r^ 'fZ 

49S 504 511 C55 

^ !!! !!! ^ GGA ASX: ^ AAC AAC 

l»ur Pro Ueu l*u U.u Clu Lys I'll ci'u Z^u oly Olu mI 

558 5C7 576 coc 

Art, val Pro „i„ i-xo .n.t t":;; ;u ™ 

Figure 10c_ 
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mot <vya maritiflia Alpha-OAiactosidauft 



A=P Leu thr Tn> Clu Clu T^r L.u Ly, A,„ 'l^^ Zl "' 



AtV Crc A*C AAC CTC *AC CTC OCC 
nur D.U Ly, A»n Leu j;!! uy, ^on Ph. Pro 

Clu vai cm xie *^ Asp Ai. ^ cij i;; ci; z;," 

7X1 720 729 

^ .^f^ 33! ^ ^ AAA ^ OCC 

V*a thr Ar, Gly A^ Ph« Pro Ser vll Glu vll ^II oIC 

^^Jf^!§!:!!ff!?f^™!!?*^?fff^^«TrcTCAAAa:TO 

A« cay H« lie Pro Cly II^ Zli Pr^ ^ vll ^ clu ^ ^ 

819 B2B 837 fij£ _ 

^!f^!!^^^ff^^5i^?f?f^fIf?™«^A*f««AAC^CACCCcI^ 
*«P ValPhe Aan Glu His Pro Asp Trp Val Vdl ClG Cly £ys 

873 882 831 son arte* 

-3!!5!^i^t?f!!f^if^WfA=viMoacic«OTia;AAAal 

Jtoc Al« Tyr Arg Aan Trp A«n Ly« Ly. II. All Z;;;! Aap 

«r- ^ 22Z ''♦S 954 563 • 

^™™™!!?f!!!3f«^"^"=«»3f5cre>GAAWMCGGeS§ 

Glu V»l 1^ Asn 55:p Leu Phe Asp Lw Pii« S« sZ^ Z^u A^ )tec 

'^^^ °" CC*^0» GAA ACA^^ 
*rg Tyr Phe Ly. II, Asp Phe Leu Pha All Cly Zll vH ^ Gl^ j;;^ 

^ t^f"^ f^"^^ f« CW^"C ACA AAA TO ATT GAG^^ ATC ACA^^ 
Ly» *an 11. -na: Pro He Gin Ala Ph. A^^ Z^ Gly III clZ Tto Til Zy« 

?2 fe's? "t'i?! - » - " c ccc'Ji: 

Ma Tnl Gly Clu Aap 5er Ptie Ue Leu Gly C^l Cly S« Pro III 
r-ry. ^^^2 ll'^O 1179 lias 

^ ^ '^'^ ''^^ ^'^^ ^ ^ ^ 

Val Gly cy« Vnl Arp Cly Mot ;urg lU Cly Pro Aa^ -niir P^o Phe Cly 



Figure 10|^( Continued) 
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7l.^r«tuoc;n mariCiiM Alpha Uduaidaur 

Giu Hi, He du Zi'i z: r^'^iiz'^ -^i 



^f! !^ *« AAc «c'^ CM wr"^* 

rie L« Arg Olu Ci„ Ly, n«r i!;: ol^ qIZ Z^s" Z^i ^ 
X359 I3£fl 1 ^T-r 

'Vr Tl» Cy, Cly Via >.„ m« III nl Clu sj j;;; ^ 



v*i *x, ASP Hi, oxy Ly, r« ^.y," ^ Leu clu z^: zj;;! ^i; 

*rsProArgV.lcii,A«ile «.t Ser Cl„ Leu 

Ser Gly Thr Leu S«r Cly Asn Via Lya n« vil vll .'r;: 

^ His i« Olu Ly. clu Cly Ly, s^; z;; z;; 

OU Cac^S^ AM AAC^TO TAG TTC^T^^ cxx ^. 

TTC TAC a\A CAC OCT CAC ACA GAA TCA 3 . 

Giu Asp Cly Asn Phe ciu ciu ci; ciu ::: 



Leu 



Val Ser 



.-.-J CIu 



Figure IOC (Continued) 
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9 18 27 36 45 54 

5 • XTG GGG ATT GC?r GGC GAC CAC TCC TGG AGC CCG TCA CTA TCG GCG Cy^A TTC CTT 

Met Gly lie Gly Gly Asp Asp Ser Trp Sar Pro Ser Val Ser Ala Glu Phe Leu 

63 72 81 90 99 108 

TTA TTC ATC GTT GAG CTC TCT TTC GTT CTC TIT GCA ACT GAC GAG TTC GTG AAA 

Leu Leu He Val Glu Leu Ser Phe Val Leu Phe Ala Ser A«p Glu Phe Val Lys 

117 136 135 144 153 162 

GTOGAAAACGGAAAATTCGCTCTGAACGGAAAAGAATTCAGATTCATTGGAAGC 

vll Glu Asn Gly Lys Phe Ala Leu Asn Gly Lys Glu Phe Ara Phe He Gly Ser 

ni 180 189 198 207 216 

AAC AAC TAG TAG ATG CAC TAG AAG AGC AAC GGA ATG ATA GAC ACT GTT CTG GAG 

?ln A^n Tyr Tyr KeC His Tyr Lys Ser Asn Gly Met He Asp Ser Val Leu Glu 

225 • 234 243 252 261 270 

AGT CCC AGA GAC ATO CCT ATA AAG GTC CTC AGA ATC TGG CCT TTC CTC GAC GGG 

Ser All Arg Asp Met Gly He Lys Val Leu Arg He. Trp Gly Pho Leu Asp Gly 

279 288 297 306 315 324 

GAG AGT TAC TGC AGA GAC AAG AAC ACC TAC ATG CAT CCT GAG CCC GOT GTT TTC 

Glu Ser Tyr Cys Arg Asp Lys Asn Thr Tyr Met His Pro Glu Pro Gly Val Phe 

333 342 351 360 369 378 

GGG GTC CCA GAA GGA ATA TCG AAC GCC CAG AGC GGT TTC GAA AGA CTC GAC TAC 

Gly vll Pro Glu Gly He Ser Asn Ala Gin Ser Gly Phe Glu Arg Leu Asp Tyr 

3B7 396 405 414 423 432 

ACA GTT GCG AAA GCG AAA GAA CTC GGT ATA AAA err GTC ATT GTT CTT GTG AAC 

Thr Val Ala Lys Ala Lys Glu Leu Gly He Lys Leu Val He Val Leu Val Asn 

441 450 459 468 477 486 

AAC TGG GXC GAC TTC CCT GGA ATC AAC CAG TAC GTG AGG TGG TTT GGA GGA ACC 

Asn Trp Asp Asp Phe Gly Gly Met Asn Gin Tyr Val Arg Trp Phe Gly Gly Thr 

495 504 513 522 531 540 

CAT CAC GAC GAT TTC TAC ACA GAT GAC AAG ATC AAA GAA GAG TAC AAA AAG TAC 

His His Asp Asp Phe Tyr Arg Asp Glu Lys He Lys Glu Glu Tyr Lys Lya Tyr 

Figure llcx. 
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SS8 567 575 eoc 

!!!!!!!!!!!! ^^'^ '''^ °" ccr tac agg gaa 

V-1 S.r Phe Leu Val Aan Hia V.l" Z^tZ 'l^ 'vi^ vll HI g'J 

621 630 630 

!-! !!! ™ !!!! ccc ?^ gag acg 

Giu Pro Thr II. Met Ai. ci": ^ii i;;^ ~ ;~ ~; 

666 675 £84 «qi 

!!!!!! ^ ^ !!!!!! "^^^ "0 TAC ATA IJ^ 

Ly« S.r Cly A.a Thr Uu V.l «« Trp vll clu Met III III ^ HI 

720 729 738 747 

-!!!!!™!!!!^!^^"^'^'^™««G*^»*«»'ITCTrCAa;AAC 
Ser Leu Asp Pro Ai=n His Leu V.l Al. Val lly Z'p cIu lly IZ III sZ IZ 

!^ ^ °" G== ^ 

Tyr Glu Cly Phe Ly. Pro Tyr Gly Cly III 111 clu ^ ill ^ cly Z^ 
"8 837 846 055 

!!-!!!!!!™!!!!f**°'^'"«*"*««*™cTO'»cTOGBCAco 

ser Gly Val Asp Trp Ly. Ly, Leu Lw III III ^ yll Zl Phe lly 'rZ 
'"'^ 882 891 900 909 aio 

!!!!!!!!!!!! ^! !!? gcc cag roc 

Phe His L.U Tyr Pro S-r His Trp Gly Val s« Pro cIu Zl ^ Z.I III 
"6 945 954 963 

!!- !!! !!! !^ ^! atc gca aaa gag atc gga aaa S 

Gly Ala Lys Trp He Glu Asp Hi, He Lys Ho III clu III ciy L^ Pro 
'81 990 999 1008 ini7 

!!!!!!!!! !^ !t! *^ «^ «:c CCA AAc AGA acg^? 

Val val Leu Glu Clu TVr Cly He Pro Lys Ser wl Pro vll Zl Zl rZ III 
.■^ "** "53 1063 1071 1080 

-!! !!!!!!!!!! ^! '^'^ gat crc ogt gga gat gga gco Itc 

He Tyr Arg Leu Trp Asn Asp Leu Val Tyr Asp Leu Gly cly Isp Gly All III 

Figure lib (Continued) 
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Th.«oto„ ^rUi^ P-^«... (contin,.., 

1098 H07 

!^ !!! I'! ^« <^ rco c.c'JJ^ ^^j^j 

.he Txp :-u n. 01. ci„ Zl ^ ^ ^ " 

1^52 1161 inft 

TAT ccG C.C Txc <^c <^ rrc ^ ^^ij^ ^ ^1188 

Tyr pro ^ ..e ^, vll Iln Zl III ^ 

1197 1206 1215 

L.U Xae OXu ryr .y. u.u ... Z^ '.Z 'oZ Zl Zl oZ Zl 

Thr cy, ser Phe II. L.« P.o Ly, A.p Cly Me^ Cl" l~ ^ ™ 

1305 1314- 1323 13,, 

CT^ ACC OCT GOT ^ TTC CAC TAC ACC ACG OU. AAo'JJ^ ^ 

Vaa A., Ala Cly V.1 ^ ^ Z^ ^ ^ ;~ ^ --- --- --- 

1359 136B 1377 .^p^ 

-!! !!^! !!!!!!!!! ^^'^ *™ CAT cTc CGA A««^ 

V.1 Glu A.P L.U vax Phe Glu A.n Zl IZ Zl Zl 'oZ ^ cZy Zl 
1413 1422 1431 

n! !!^! !!! !^ -T^^J CAA cAT^SJ A«, Tx/S 

civ. p.. A,p L,u A.P Thr xie P.^ Zl Zl Zl Zl Zl 

1467 1476 14B5 wqj 

f^!: !!f !^ ^" >^ »Tc 

=iu =1, .1. a, L„ a, vi »^ ^ ~- ~; ~; ~ ~ 

1521 1530 i535 

™ !!!^ °- "A CTT rrr Tcc'?S CCA gaa'^g 

Asn Glu xra xyr V.l u.u Al. Glu Z: Zl Zl Zl Zl sZ Zl Zl Zl 
1575 1584 1593 ifino 

^ ^ !!!!!! ^! ^ ™= CAc CCA GAG rrc'S ^ ccT 

V.l Ly, rrp Trp A-n Ser Gly 'rZ Zr^ Z'r. Zl ZZ Zl Z'y Zr Zl Zl 

Figure Continued) 
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1638 1647 = £ 

Ue Clu ^„ - - --- --- --- --- --- ... 

CCC CC^^S^ .oc C^C^J^ c^^^ ^ ^^^^ ^^^ZTja 

Pro CI. Uy, ser oiu Clu V.I G." l^l l^l ^ ^ ^ 

Ser Olu cy, Olu II. l«u Qlu Tyr Asp Ue" HI III ^ 'ai: ^ ^ 
1791 1800 1809 1910 , 

c« ^ ^ ^ CCC ..c ceo CTT c™ ;.c^S CCC roo'S^ ^0 .r.''^ 

Ly- OXy .eu A,, pro Tyr AI. va Leu ;« ^ ^ -~ 

L.U ASP H.t A3n A.„ Al. A,n V.X oxl III HI ^x" x"e' xi^ ^il^ 
1899 1908 1917 lo-sr 

!!f i!^ -™ "c 

Ly «1» x„ „. „. ^ ~ - ~ -~ -~ ... ... 

1953 1962 1971 ^con 

AAA CAA CT. CAC „A CCA CT. CXC CC, C., CA.^?^ 

Ly. cxu :-u Hu XX, cxy vax V.X cx; hL' - «; xi; 

2007 2016 2025 50^^ 

TTC ATC CAT AAT C^ ^ CTT TAT AaI ACA ACa'SS CCT A«. 'JJ^ 3 • 

phe xxe A.P A^„ V.X Arg Zlu i;; i;," i^; ::: 



Figure lid (Continued) 
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5 IB 27 



»« ^ ,^ ,^ ~ --- --- ... .„ ... ... 

•I! ?f !!! ^ - - CO, .cc ^ ^ ^ ^ - 

„p u„ ^ ~ ~ ;^ ~ ~ ~ 

^ ^ !!!!!! ™ ^ =« c„ ccc ^ 

"^^^^^ ^ ^ „. - ~ ~ ~- ... ^ ... 

180 loe 

- ^ ^ - 
=iv n. «. ^ ^„ ~ -~ --- ... 

225 234 

^ !!! ^^f - r — - - ™ ccc 
L«. =1, 1.. ^ ^ „. ^; ~ ~ ~ ~- ... 

i ?!! !?! - S cxc „ ^ „ ^5 ^ - 
,u ,^ ~ ~ ~ ~ ~ ~ ~ 

342 

V. u. ^ ~ - ^ z:; ;;;; 

387 396 

«= =^ X„ ,^ ^ ^ „ ^ «. ^ ^ «3 ^ ^ 

- o.- v.. ^. ^ ^ ^, ^; - -~ ~~~~ ~ ... 

44X 450 4Cfi 

v.. ». ^ ^; ~ ~ ... ... „. „. ... 

S04 

ATA CTG CCA AGG MG AAG GCC CTC ACA AAC G*r . 540 

- -!: '^TC GGC TGG GTC TCC CAG 

He Val Ala Arg Glu Ly« Al« r «„ T 

Ly. Al. LOU Thr A.„ A,p Ar. He Gly Trp Val Ser Gin 

Figure 12CL 
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*»XZ la P-aaiuie«ld>a« (MOBl) (contiau.d) 

5" 558 567 576 535 50. 

™ !!!!!! !!^! !!! '^'^ ™' <^ "^^^ *«= cat ccg crc gga 

Arg Thr Val Val Glu Phe Ala Ly, lyr Al^ III 'r^ nl HI ^1 HI j^" ^ 
"2 621 630 639 64a 

"!!!!!!! ™ !r= *~ eTA <n4 cms GAG 

iup leu val Asp Thr Trp S.r Thr Ph. Asn Glu Pro Met vll vll vll Glu [lu 
S66 675 684 693 702 

™ !!!!!! '^'^ '^^ <^ **c GAG Gcc 

Gly Tyr Leu Ala Pro Tyr Ser Gly Phe Pro Pro Gly Val Met HI Pro Glu Ala 

■ 711 720 729 738 747 756 

GCC AACS CTO ceo ATC CTC AAC ATO ATA AAC CCC CAC GCC TOJ GCA TAT AAG ATG 



Ala Lys Leu Ala lie Leu Asn Met 11. Aa& Ala 



Hi* Ala Leu Ala Tyr tys Met 



■'SS 774 783 792 801 BID 

ATA AAG AGG TTC GAC AeC AAG AAG GCC GAT GAG CAT ACC AAG TCC CCT GCG GAC 
lie Lys Arg Ph. Asp Thr Lys Lys Ala Asp Glu Asp Ser LyI s« Pro Ail 

819 828 837 846 855 864 

GTT OGC ATA ATT TAG AAC AAC ATC OCT CTT GCC TAC eCT AAA GAC CCT AAC GAT 

Val Gly lie He Tyr Asn Asn He Gly Val Ala Tyr Pro Ly^ Hi H^ HI 

S'^^ 882 891 900 909 91B 

CCC AAG C»C GTT AAA GCA GCC GAA AAC GAC AAC TAC TTC CAC ACe GGA CTG TTC 
Pro Lys Asp val Lys Ala Ala Glu Asn Asp Asn Tyr Phe His Ser Gly vH Phe 

92'' 936 945 954 963 972 

TTTGATSeCATCCACAAGOC?rAACCTCAACATAGAOTTCGACGOCGAAAACTPr 
Phe Asp Ala He Bis Lys Qly Lys Leu Asn II. Glu Phe Asp Gly Glu A^n Phe 

990 999 1008 1017 1026 

CTA AAA err ACA CAC CTA AAA GOC AAT GAC TOO ATA GGC CTC AAC TAC TAC ACC 

Val Lys val Arg His Leu Lys Gly Asn Asp Trp He Gly Leu Zln T^ Th^ 

1035 1044 1053 1062 1071 1090 

CGC GAG GTT err AGA TAT TCG GAG CCC AAG TTC CCA ACT ATA CCC CTC ATA TCC 

Arg Glu Val Val Arg Tyr Ser Glu Pro Lys Phe Pro Ser He ^o L^u ill Ser 

Figure 12b(Continued) 
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MFZX 1. p..^o.ld.,. (,3081) (eontl««.d) 

1089 1098 1107 ,,,, 

«c ^ ooc CTT ccc ^ r.c occ T« ,cc ^^ii* ccc occ'l^ .cc ^c^^^J 

Pha .y, aiy V.1 P.O ^„ ryr Cly ^ s« ^ ^ ^ ^ ^ ^ ^ ^ 
1152 iifii 

™ !!! !!! — <« - »t ccc'S^ .^'J^l 

».P «y «.t v.i s„ „. ~ ~l ~ - ~ -~ -J- --- 

11S7 1206 1215 i55i! 

™ !^ === ^^'^ err Txc'SJ xcc «c'i:^ 

A.P S,r lie V.1 Olu Al. ^ ^ " ™ ^ ^ ~- --- 

1251 1260 1269 1378 

OCT or, ceo C.T Tcc ccc «c ACS CT. ccx"I| 

Gly Val Ala ser Ala Aap ~ ""^ ~ ~; 

1305 * 1314 1323 ni-j 

^ !^ !!! ^" S cta aw.'^ ,;,c atc"Ic 

ser He Giu clu He du «; vii z;; ^~ 

1359 1360 1377 ,,oe 

^_^_cr.^_^r^a^_o^r^acccrc'Sirrc xcc"?| ^ 

Trp XI. X.U Thr X.P X.„ Tvr du U: ™ ^ ~ ~- 

1413 1422 1431 

™ ?!! r« "= ~ ceo «='j^ 

<-u t:„ !,„ V.1 ».p „. s., .„ - j~ ~ ~ ~ 

1467 1476 

^ !!! ^ !!! "'''^J^ ccr^JS cc xxo'iJJ x«: xxx^^ 

=iu ne xx. XT, II, v.1 cm ser d; ;;x z;: l"; ii; i~ ii: 

1521 1530 1539 

GAG TTC CTG AAG GGT GW5 GAG AAA TCA 3' 

Glu Phe Leu Lys Gly Glu Glu Ly« 



Figure 12a{Continued) 



wo 98/24799 



27/46 



PCT/US97/22623 



0C1/4V XadofflBCMiM. {330»1) 
9 IB 27 



Hec V. cxu ^, p.. ^, ^ --- --- ... ... 

3« s« ci; i;: z; ;i: ;;: - - - 

Ser. Mec Clu Cl„ Ser V.i xi. ox„ s« HI ~ ^ ^ ^ ^ ^ 

180 1B9 ICO 

^ !^ ^ «T S ^ ^ ^ 

Hec V.X CI. c., - ~: - --- - - 

oiy .za 01. xa, cau ^ cIJ ^ ^ ^ 

CIX PHe se. V.1 XXe P,„ XX. ^ Xl" s"^ 

P.O .sp XI. ^„ ~ - - - - ~- --- ... 

xia ex. ... ^„ ^„ ^ - --- .-- --. ... 

450 

^ !!! -T ™ ™= S ™, j;^ 

^ ^ ~ ~ --- ~- ~ ... ... 

504 ..^ 

^ ™ !!! ™ ™ ™ lii tl? 

^„ ^ ,„ ~ ~ ~ ... ... ... „. „. 

Figure 130- 
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OC1/4V SadoolncAnasa OlGPl) (contiau^d) 

549 558 567 576 585 594 

GAG CCT CCT CAG AAC TTG ACA GCT GAA AAA TCC AAC GCA CTT TAT CCA AAA CTG 

Glu Pro Ala Gin Asn L«u Thr Ala Glu Lya Trp Aan Ala Leu Tyr Pro Lys Val 

603 612 621 630 639 648 

CTC AAA GTT ATC AGG GAG AGC AAT CCA ACC CGG ATT GTC ATT ATC GAT GCT CCA 

Leu Lys Val lie Arg Glu Sex Aan Pro Thr Arg lie Val lie lie Asp Ala Pro 

657 666 675 684 €93 702 

AAC TGG GCA CAC TAT AGC GCA GTG AGA AGT CTA AAA TTA GTC AAC GAC AAA CCC 

Aan Trp Ala His Tyr Scr Ala Val Arg Ser Lou Lya Leu Val Aan Asp Lys Arg 

711 720 729 738 747 756 

ATC ATT GTT TCC TTC CAT TAC TAC GAA CCT TTC AAA TTC ACA CAT CAG GCT GCC 

lie He Val Scr Phe Hia Tyr lyr Glu Pro Phe Lys Pha Thr His Gin Gly Ala 

765 774 783 792 801 810 

GAA TGG GTT AAT CCC ATC CCA CCT GTT AGG GTT AAG TGG AAT GGC GAG GAA TQG 

Glu Trp Val >^ Pro He Pro Pro Val Arg Val Lys Trp Asn Gly Glu Glu Trp 

B19 828 837 846 855 864 

GAA ATT AAC CAA ATC AGA AGT CAT TTC AAA TAC GTG AGT GAC TGG GCA AAG CAA 

Glu He Asn Gin He Arg Ser His Phe Lys Tyr Val Ser Asp Trp Ala Lys Gin 

873 882 891 900 909 918 

AAT AAC CTA CCA ATC TTT CTT GGT GAA TTC GGT GCT TAT TCA AAA GCA GAC ATC 

Asn Asn Val Pro He Phe Leu Gly Glu Phe Gly Ala Tyr Ser Lys Ala Asp Met 

927 936 945 954 963 S72 

GAC TCA AGG GTT AAG TGG ACC GAA AGT GTG AGA AAA ATG GCG GAA CAA TTT GGA 

Asp Ser Arg Val Lys Trp Thr Glu Ser Val Arg Lys Met Ala Glu Glu Phe Gly 

»81 990 999 1008 1017 1026 

TTT TCA TAC GCG TAT TOG GAA TTT TGT CCA GGA TTT GGC ATA TAC GAT AGA TCG 

Phe Scr Tyr Ala Tyr Trp Glu Phe Cys Ala Gly Phe Gly He Tyr Asp Arg Trp 

1035 1044 1053 1062 1071 1080 

TCT CAA AAC TOG ATC GAA CCA TTG CCA ACA GCT GTG GTT GGC ACA GGC AAA GAG 

Ser Gin Asn Trp He Glu Pro Lou Ala Thr Ala Val Val Gly Thr Gly Lys Glu 
TAA 3* 



Pigure 13b (Continued) 
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5' ATC GAT CTT ACA AAO CTG GSG aTC ATA Cm xrr JS. *^ 54 

-- ^ '^'^ CTC AAC GAG TCW CAC GCA AAA 

«.c v.. X.. n,- c;; ~ =-:; ii; ;i; ~ 

°= ™ =1? cS ^ ^ - 

v.. „^ ~ ~ --. --- --. ... ... _ 

f!f °" ii! =« iS ccc 

v« «. „. ~ - ~; ~ ~ ~ ~ --- 

^"'^ 10Q 

r.. «» =1, ^ ^ ~ -- -- ~ ~; ~ ~ - ~ 

234 

P.. v« ^ ^ - ^- - -. ... „. 

™ ccc 1^ _^ ^ =« ^ 2S «T occ „. „ ^ 2^ 

^ v.. s.. «u la s z; ii: 

^ ~ ^ cxcl?: ^ ^ ^ 
V.1 ^ ^ ^ ~ ~ ~ ~ 

™ <»« m A«: »T» ^ T»c ceo oa CTC ak: ™ 

— ATC ATG ATG GAG ATC 

Val Glu Leu lie He Glu Gly Tv^ lCI oZZ T 

u oiy Tyr Lys Pro Ale Arg Val He Met Met Glu He 

JCQ 

™ =^ =^ TAC T« «C «T OCA «0 ™ S =,» CCA «C 

A.. ^ ^ ^ - ~; ~ -.- ^. „. 

504 

!!! ^ AAO CTA AAC Si CTT CTC 

XI, T.P se. p„ v.a si; - - - - - 

Figure 14^ 
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Th«»ot09. Mrltl*. »ull«l.a... (Bo,3, (contlnu.d) 

"8 567 57« COS 

^ ™ !^ !^ !!! ™f cnl aac atg g" tac aac 

Lys Asn Gly Glu Asp Thr Gl"u ^r'^ 't^ IVn 7.1 7al nil gIu 't^ lys H'y 
612 621 «30 c,« 

^! !!!!!!!!! !!!!!!!!! GGA rrc TAC J" 

Asn Gly V.l Trp Glu Ala vll vll Gl"u Gl'; Zl Zl lly CII ™ ~ 

fi7S 684 fioi 

!-! ™ !!! !!!! ^! !*! ^ aca acc gtc gat ta, tcg 

Tyr Gin Uu Glu Xsn Tyr Gly Ly, ~xZ Zl vH Zl '7^ ser 

- ''2° '25 738 747 -.se 

!f!!!!!*!!f^^^^!^!*===°=«"°«**'-crrGCCAGGAc:Al^ 

Ala V«l Tyr Al. Am A«n Gin Glu S«r Al. V«l vll Zl Zu Zl Z'g ^ 

''74 783 792 oni 

™ ^ !!: !^ ^ ^ ^ f = TAC GAA GAC KG° 

Pro Glu Gly Trp Glu A« A.p Zy Zl IZ lie gIu Zy ^ IZ Zl 

M"' 846 855 ac< 

*!! ™! ^ *CA CGA CTC GAA AAC TCC GGG ctI 

He !!• tyr 01« II. Hi. He Al. Zl Zl ~rZ Zly Leu Glu Zl S« Gly vll 

"1 900 909 oiH 

Ly« A*n Lys Gly Leu Tyr Le« Gly Leu Thr Glu Glu Zl tZ LyI Gly ^ro Gly 

SIS 954 og^ 

~!!!!t!^™!!!!!!!!!!*^="'™"*™«"«"A»cACGrr?IJ 

Gly V«l Thr Thr Gly Leu Ser Hi. I^u vZ Glu lIu Gly vll Z^ ZZ vll hII 

999 1008 1017 ,n-,e 

!!! ™ !!!!!! !^! !!! !t! ^ ~* "c Sic 

He Leu Pro Ph. Phe A.p Ph. Tyr Thr Gly Zl gIu Zl Zl Lya Zl Zl Zl 

1053 1062 1071 

tt! !t! !*! !!! ?!! !*! !*! «t » ac c^ ttc at^ orr ccJ gag cgc J2! 

Ly. Tyr Tyr A.„ Trp Gly Tyr A.p Pro 'tZ Zl Zl Met Z.I Zl Z.I Gly Z'g 
Figure 14b(Continued) 
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X..«o.o.. ^^^^^^ ^^^^^^^^^^^ 

1089 1098 1107 

T« 3c. „c - «c c« c« .CO ^ ^1.=. 

'v. s« T,„ «p „, ,„ ~ - ■-. ~ ..- ... .„ ... ... 



V- «. ^ ~ - ;;; ~ ~ -~ --- ... ... ... 



1197 1206 13m 

~ ^ !!f "c'S^ a. .cc'^ ^ 

... r^^^^ „. ^ - ~ ~- ~ ;~ ~ ... ... ^ ^ 

™ r^f !?f - - - - =« „ 

T„ «, n. ^ ~ ~ - ~ ~ - ... 

1305 1314 1321 

CTC ^ ^ ^ ^ ^ ^ ^^u., ^ 

V.X i..^^ s„ ^; ~ - ~ ~- ~- ^ ... ^ 

^ v.. « „. „. I, ii; ~ ^ ^. „. 

1422 li-Ji 

.3= «c ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^.^ 

.1. ..p ^ ^. u„ „J ~ - ~ -~ ... ^ „. 

^^^7 1476 14PC 

«. n. ^„ ^ 0., p„ ^ ~ -~ -~ ... 

1521 1530 i53g 

.V, s„ ^ „. ^ ~ ~ ~ - ~ ~ ~ ~ -. 

157S 1584 ,co, 

GAC CCA ATA AGO GOT TCC GTC TTC AAC CCG *r^^ "20 

!r ^ '''^ **G CGA TK: CTC ATC GGA 

A»P Ala n. Arg Cly S« Val Ph. A.n ser vll III 'r^' IZ 

o s»er val Lys Gly Phe Val Met Gly 

Figure 14C( Continued) 
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"38 1647 ifisfi 

1^.'^.^'.'^. ^f! ^ °" «^ Acc .ac":J 

oly Giu Thr l;." Ill zi ci; vii gI; s« i": ;;;; 

^583 1692 1701 T7in 

^ ^ !!! fff S <^ c;.^^? ^'lll 

Asp Cly Ly, Leu II. Lya Ser Ph. mI Zl Vrl cl^ ciu 'tZ zxl 'r^; 

™ ^. ^"'^l f^f ^'fi «c'^ *AC ,.c'S GCC cecals 

AI. Al. Cy. Hi. Aap X,„ hII Zl Zn ^ Uu Zl Zl l^^ 

^^^^ 1800 1809 IPIfl no«i^ 

!!! ^ !^ !!! *f f ^, aaa a^^'Sc aaa'Sc' 

Al. ASP Ly. Ly. Ly. Clu Trp 'oZ IZ 'oZ Zl lyl Zl Zl Zl lyl Zl 



1845 1854 



Al- Gly Al. II. L.U Leu Tto Ser Zl Zy vZ Zl Zl Z^ Zl Zy Zy Zl 

1944 



1899 1908 1917 



^ ^! !!! ^ «^ aac ^ ccr 

A.P Ph. cy. Arg Thr Zl Zl Zl Zl Zl IZ 't^ Zll Zl Zl Zl IZ 
1953 1962 1971 

^ !!f !!! !^! !^ !^ ^ ^ gac'^S 
ne As„ Cly Ph. ASP Zl Zl IZ Zl Zl Z:. Zl Zl Zl ZZ Zl 'tZ 

2007 201S 2025 ^nu 

^! !!!!!! ^ !!! ccT TTc'^ c« aaa'^ 

His Ly, Cly ^ n. Zl IZ Zl Zl Zl Zl ZZ Zl Zl Zl 'i^Z 

2061 2070 2079 2088 

!f! !^ !!!!!! ^ f!^! -c'JS ACA ATA^J^ 

Ala Glu Glu He Ly. Ly. Hi. L.« Clu Zl Zl Zl Zy III Zl Zl Zl "' 



Val 



2115 2124 2133 2143 

GCC TTC ATC C« AAA CAC CAC CCA GGX ^''^ ^^JJJ 

Al- Ph. Hec Leu Ly, A.p Hi. All Zl Zl Zl Zl ZZ HI Zl Zl Zl ZZ 

Figure 14a( Continued) 
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2169 2178 

T« ^ ^ ^ «. rJi^ ™ ccx'^ =a> 

n. ^ ^ *■» - 'V «; ;^ ~ - ~ 

2223 2232 23ji 

^ ~ ------ - 

v.. v.. ^. ~ - ~ ~- --- --. .-- ^- ... ... 

^^^^ 2286 350C 

c^c ccc c,, 4? =0= ™ ^ ^ ^. 

«y,™r I.. ,„ ^ ~ ~ ~ ~- ~ ~- ~- ... 



Figure 14e( Continued) 
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Figure 15. Thermotoga ntaritima MSB8 (Clone # 6GP2) Glycosidase 



1 



CTT TTA TTG ATC GTT GAG CTC TCT TTC GTT CTC TTT GCA AGT GAC GAG TTC 
Leu Leu Leu He Val Glu Leu Ser Phe Val Leu Phe Ala Ser Asp Glu Phe 

GTG AAA GTG GAA AAC GGA AAA TTC OCT CTG AAC GGA AAA GAA TTC AGA TTC 
val Lys Val Glu Asn Gly Lys Phe Ala Leu Asn Gly Lys Glu Phe Arg Phe 

ATT GGA AGC AAC AAC TAC TAC ATG CAC TAC AAG AGC AAC GGA ATG ATA GAC 
He Gly ser Asn Asn Tyr Tyr Met His Tyr Lys Ser Asn Gly Met lie Asp 

AGT GTT CTG GAG AGT GCC AGA GAC ATG GGT ATA AAG GTC CTC AGA ATC TGG 
Ser val Leu Glu Ser Ala Arg Asp Met Gly He Lys Val Leu Arg He Trp 

GGT TTC CTC GAC GGG GAG AGT TAC TGC AGA GAC AAG AAC ACC TAC ATG CAT 
Gly Phe Leu Asp Gly Glu Ser Tyr Cys Arg Asp Lys Asn Thr Tyr Met Eis 

CCT GAG CCC GGT GTT TTC GGG GTG CCA GA*. GGA ATA TCG AAC GCC CAG AGC 
Pro Glu Pro Gly Val o„e Gly Val Pro Glu Gly He Ser Asn Ala Gin Ser 

GGT TTC GAA AGA CTC GAC TAC ACA GTT GCG AAA GCG AAA GAA CTC GGT ATA 
Gly Phe Glu Arg Leu Asp Tyr Thr Val Ala Lys Ala Lys Glu Leu Gly He 

AAA CTT GTC ATT GTT CTT GTG AAC AAC TGG GAC GAC TTC GGT GGA ATG AAC 
Lys Leu Val He Val Leu Val Asn Asn Trp Asp Asp Phe Gly Gly Met Asn 

CAG TAC GTG AGG TGG TTT GGA GGA ACC CAT CAC GAC GAT TTC TAC AGA GAT 
Gin Tyr Val Arg Trp Phe Gly Gly Thr His His Asp Asp Phe Tyr Arg Asp 

GAG AAG ATC AAA GAA GAG TAC AAA AAG TAC GTC TCC TTT CTC GTA AAC CAT 
Glu Lys He Lys Glu Glu Tyr Lys Lys Tyr Val Ser Phe Leu Val Asn His 

GTC AAT ACC TAC ACG GGA GTT CCT TAC AGG GAA GAG CCC ACC ATC ATG GCC 
Val Asn Thr Tyr Thr Gly Val Pro Tyr Arg Glu Glu Pro Thr He Met Ala 

TGG GAG CTT GCA AAC GAA CCG CGC TGT GAG ACG GAC AAA TCG GGG AAC ACG 
Trp Glu Leu Ala Asn Glu Pro Arg Cys Glu Thr Asp Lys Ser Gly Asn Thr 
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z z z 2 z z r r r - 

lyr lie Lyg ggr Leu Asp Pro 

r - - - z - 2 - - - - - z z 

orr OAC Tcc ^ ctc ctt xcc .cc cxo c.c ttc gcc .co ttc 

val ASP Trp .ys .y. .eu .eu Ser xa. ol. rUr Val ..p p.^ Gly p^e 

Ma x" r r - 

lie Tyr Arg Leu Trp Asn Asp Leu Val Tyr Asp Leu Oly Gly Asp 
QGA GCO ATO TTC TGG ATG CTC GCO GGA ATC GGG Ca;. r-r^ 

0. ... - - - - - - 

2 ™ z z z ^° r r - ™ - «" =^ - 

9 Cly lyr Ty, Ko »»p Tyr A.p „iy pt. 

^'^^222ZT.z:2Z7.zr::—' 

3 «iu iyr Aia Lys Leu Phe Asn Thr Gly 
GAA GAC ATA AGA GAA GAG ACC TGC TCT TTC ATf rr^ 

Glu Asp He Aro Glu a. . "^"^^ ^ ^'^C GGC ATG 

P Arg Glu Asp Thr Cys Ser Phe He Leu Pro Lys Asp Gly Met 

ATC AAA AAC ACC GTG GAA GTG AGO OCT GGT CTT TTC GAC TAC AOC AAC 



Figure 15t (continued) 
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Glu lie Lys Lys Thr Val Glu Val Arg Ala Gly Val Phe Asp Tyr Ser Asn 

ACG TTT GAA AAG TTG TCT GTC AAA GTC GAA GAT CTG GTT TTT GAA AAT GAG 
Thr Phe Glu Lys Leu Ser Val Lys Val Glu Asp Leu Val Phe Glu Asn Glu 

ATA GAG CAT CTC GGA TAG GGA ATT TAG GGC TTT GAT CTC GAG ACA ACC CGG 
He Glu His Leu Gly Tyr Gly He Tyr Gly Phe Asp Leu Asp Thr Thr Arg 

ATC CCG GAT GGA GAA CAT GAA ATG TTC CTT GAA GGC CAC TTT CAG GGA AAA 
He Pro Asp Gly Glu His Glu Met Phe Leu Glu Gly His Phe Gin Gly Lya 

ACG GTG AAA GAC TCT ATC AAA GCG AAA GTG GTG AAC GAA GCA CGG TAC GTG 
Thr Val Lys Asp Ser He Lys Ala Lys Val Val Asa Glu Ala Arg Tyr Val 

CTC GCA GAG GAA GTT GAT TTT TCC TCT CCA GAA GAG GTG AAA AAC TGG TGG 
Leu Ala Glu Glu Val Asp Phe Ser Ser Pro Glu Glu Val Lys Asn Trp Trp 



AAC AGC GGA ACC TGG CAG GCA GAG TTC GGG TCA CCT GAC ATT GAA TGG AAC 
Asn Ser Gly Thr Trp Gin Ala Glu Phe Gly Ser Pro Asp He Glu Trp Asn 

GGT GAG GTG GGA AAT GGA GCA CTG CAG CTG AAC GTG AAA CTG CCC GGA AAG 
Gly Glu Val Gly Asn Gly Ala Leu Gin Leu Asn Val Lys Leu Pro Gly Lys 

AGC GAC TGG GAA GAA GTG AGA GTA GCA AGG AAG TTC GAA AGA CTC TCA GAA 
Ser Asp Trp Glu Glu Val Arg Val Ala Arg Lys Phe Glu Arg Leu Ser Glu 

TGT GAG ATC CTC GAG TAC GAC ATC TAC ATT CCA AAC GTC GAG GGA CTC AftG 
Cys Glu He Leu Glu Tyr Asp He Tyr He Pro Asn Val Glu Gly Leu Lys 

GGA AGO TTG AGG CCG TAC GCG GTT CTG AAC CCC GGC TGG GTG AAG ATA GGC 
Gly Arg Leu Arg Pro Tyr Ala Val Leu Asn Pro Gly Trp Val Lys He Gly 

CTC GAC ATG AAC AAC GCG AAC GTG GAA AGT GCG GAG ATC ATC ACT TTC GGC 
Leu Asp Met Asn Asn Ala Asn Val Glu Ser Ala Glu He He Thr Phe Gly 

GGA AAA GAG TAC AGA AGA TTC CAT GTA AGA ATT GAG TTC GAC AGA ACA GCG 
Gly Lys Glu Tyr Arg Arg Phe His Val Arg He Glu Phe Asp Arg Thr Ala 



Figure 15C(continued) 
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GGG GTQ AAA GAA CTT CAC 
Gly Val Lya Glu Leu His 

GGA CCG ATT TTC ATC GAT 
Gly Pro He Phe He Asp 

TGA 1991 
END 



ATA GGA GTT GTC GGT 
He- Gly Val Val Gly 

AAT GTG AGA CTT TAT 
Asn Val Arg Leu Tyr 



GAT CAT CTG AGG TAG GAT 
Asp His Leu Arg Tyr Asp 

AAA AGA ACA GGA GGT ATG 
Lys Arg Thr Gly Gly Met 



Figure 15d(continued) 
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CO 
30 

120 
40 

IBO 
EO 



Figure No. l%iThermoeoga maritima MSB8(«gb4) 

1 ATG AAA AGA ATC GAC CTG AAT GGT TTC TGG AGC GTT AGO GAT AAC GAA GGG AGA TTT TCG 
1 Het Lys Arg il. Asp Leu Asn Gly Phe Trp Ser Val Arg Asp Asn Glu Gly Arg Phe Ser 

61 TTT GAA GGG ACT GTG CCA GGG GTT GTC CAG GCA GAT CTG GTC AGA AAA GGT CTT CTT CCA 
21 Phe Glu Gly Thr Val Pro Gly Val Val Gin Ala Asp Leu Val Arg Lys Gly Leu Leu Pro 

121 CAC CCG TAC GTT GGG ATG AAC GAA GAT CTC TTC AAG GAA ATA GAA GAC AGA GAG TGG ATC 
41 His Pro Tyr Val Gly Met Asn Glu Asp Leu Phe Lys olu He Glu Asp Arg Glu Trp lie 

181 TAC GAG AGG GAG TTC GAG TTC AAA GAA GAT GTG AAA GAO GGG GAA CGT GTC GAT CTC GTT 240 

" Tyr Glu Arg Glu Phe Glu Phe Lys Glu Asp Val Lys Glu Gly Glu Arg Val Asp Leu Val ao 

241 TTT GAG OGC GTC GAC ACG CTG TCG OAT GTT TAT CTG AAC GGT GTT TAC CTT GGA AGC ACC 300 

Bl Phe Glu Gly Val Asp Thr Leu Ser Asp Val Tyr Leu Asn Gly Val Tyr Leu Gly Ser Thr 100 

301 GAA GAC ATG TTC ATC GAG TAT CGC TTC GAT GTC ACG AAC GTG TTG AAA GAA AAG AAT CAC 3S0 

101 Glu Asp Mec Phe lie Glu Tyr Arg Phe Asp Val Thr Asn Val Leu Lys Glu Lys Asn His 120 

361 CTG AAG GTG TAC ATA AAA TCT CCC ATC AGA GTT CCG AAA ACT CTC GAG CAG AAC TAC GGG 420 

121 Leu Lys val Tyr lie Lys Ser Pro He Arg Val Pro Lys Thr Leu Glu Gin Asn Tyr Gly 140 

421 GTC CTC GGC GGT CCT GAA OAT CCC ATC AGA GGA TAC ATA AGA AAA GCC CAG TAT TCG TAC 480 

141 val Leu Gly Gly Pro Glu Asp Pro He Arg Gly Tyr He Arg Lys Ala Gin Tyr Ser Tyr 160 

481 GGA TGG GAC TGG GGT GCC AGA ATC GTT ACA AGC GGT ATT TGG AAA CCC GTC TAC CTC GAG 540 

161 Gly Trp Asp Trp Gly Ala Arg He Val Thr Ser Gly lie Trp Lys Pro Val Tyr Leu Glu 180 

S41 GTG TAC AGG GCA CGT CTT CAG GAT TCA ACG GCT TAT CTG TTG GAA CTT GAG GGG AAA GAT SOO 

181 Val Tyr Arg Ala Arg Leu Gin Asp Ser Thr Ala Tyr Leu Leu Glu Leu Glu Gly Lys Asp 200 

601 GCC CTT GTG AGG GTG AAC GGT TTC GTA CAC GGG GAA GGA AAT CTC ATT GTG GAA GTT TAT 660 

201 Ala Leu Val Arg Val Asn Gly Phe Val His Gly Glu Gly Asn Leu lie Val Glu Val Tyr 220 

661 GTA AAC GGT GAA AAG ATA GGG GAG TTT CCT GTT CTT GAA AAG AAC GGA GAA AAG CTC TTC 720 

Val Asn Gly Glu Lys He Gly Glu Phe Pro Val Leu Glu Lys Asn Gly Glu Lys Leu Phe 240 

721 GAT GGA GTG TTC CAC CTG AAA GAT GTG AAA CTA TGG TAT CCG TGG AAC GTG GGG AAA CCG 780 

ASP Gly Val Phe His Leu Lys Asp Val Lys Leu Trp Tyr Pro Trp Asn Val Gly Lys Pro 260 
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Ser He V.X Leu Trp Cy. Gly Asn Asn Glu Asn Asn 420 
12S1 TGG GGA TTC GAT GAA TOG GGA AAT ATG rrr a^. 

^ zzzzzz z. z r °" 

s i-ya vai Asp wrly lie Asn Leu Gly Asn 440 
■<<1 ore TOO T«C TO K» «0T MC KSO »T0 .AC TAC ..n ... 

<" ,^ ^ ".° " " =t ^ ^ r r °" 

y ^lu Asn Tyr Glu Lys Asp Thr Gly Arg SOO 

A-eu i-ys HIS Asn Lys Gin Val Glu 540 
Figure 1 6 b( continued) 
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1621 GGA CAG GAA AGA TTG ATC AGG TTC ATA TTr era hjit ^ 

- - - - ^ - - - - - - - - - 0. 

'"i s": p';:! v'li r r "^'^ "° °« 

Phe val Tyr Uu Ser Gin .eu As„ cin Ala Glu Ma He .y, P.e Gly val 



1861 



«r9 .h, U, ,l„ vl L™ fro ..1 t.„ i„ ^ „„ 
». L„ ^ s„ „^ ^ ^ 



1981 



AT^ Glu Glu Cly Arg Lys Gly He Arg Lys Asp Leu Gin Asn Gly Thr Pro Ser Arg Arg 



1740 
580 

1800 
600 

1860 
620 

1920 
640 

1960 
660 

2040 
680 



2041 TGT GAG TTT GGT TCA 205S 
681 Cys Glu Phe Gly End 685 



Figure i6C(continued) 
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so 

20 



Figure Ho. l"b..Banlcia gouldi (379p4) 

1 ATG AAA A;u AAT CTA CTA ATC TTT AAA A« err ACG TAT CTA CCT TTG TTT TTA ATG CTO 
He. X.S Asn .eu .eu Me. P.e .ys Arg .e. T.. T^ p„ ™ 

61 CTC TCA CTA ACT TCA GTA GCT CAA TCT CCT GTA CAA AAA CAT GGC CGT TTA rr. . 

^1 .e. se. se. Se. Va. aU G. s« Pro Va: h"! ^ i: 2 Zl Zl Z 

121 GGA AAC CGC ATT CTT AAT GCG TCT GGA GAA ATT ACG Arr nr^ 

0. A. A.3 ne A. .a s« 0. G. :^ z: :z z z z: -i: 

m <^Cr GGA CAC ACC TCC GAT TTT TAT AAT GCA CAA ACT GTT GAT TT, TTA OCA 

«1 Trp ser Asn Ala Gl. Asp T.r Ser A-p PKe T.. Ma GIu r.r Val Asp ^ 



240 
80 



'a*; gT t" r '^'^ AAA GAA AAT TGO GAT GGC 300 

Clu A3n Trp As„ ser Ser Uu rXe Arg XXe Ala «e. Gl. Val gIu As„ Trp Asp Gly ^oo 

'oJ r r =^ °" ^ AAA GTT ATT GAT 360 

-OX Gl. As„ Gl. ryr Xle Asp Ser Pro Gl„ olu Gin Glu Ala .,s Xla Arg Val a'^ Lo 

l" !^rr°"'^'°''''™'""*"'^"*"^™^«-*"'^^«^CCAGAGTTA «0 

1- Ala Ala Xle Ala Asn Gl. Xle Ty. Val Xle Xle Asp Trp Hi. Thr His Glu Ma G^ ^ "o 

^ ^T"rr'"°"''''""'*'=*°^'^^<^°^'^"*--=«^=---CTCCC «0 

Tyr Thr ;.p clu Ala Val .sp PHe P.e T.r Arg Mec Ala Asp Uu ryr Oly Asp Thr Pro x« 

«1 As„ val „e. T^ Clu Xle Tyr Asn Glu Pro XI. Tyr Gin Ser Trp Pro Val Xle Z Z 

yr Ala Glu Gin Val He Ala Gly Xle Arg Ser Lys Asp Pro Asp Asn Leu lie Xle Val 200 

2 z t" r r t °" ^ - - - - - «o 

^ly Thr Ser Asn Tyr Ser Gin Gin Val Asp Val Ala Ser ai^ « 

«P vai Aia ser Ala Asp Pro He Ser Asp Thr 220 

"1 GTA CCA CAG ACA CCA TTA GAT AAT AAT GTT GCT TTG TTT GTT ArA r.. 

241 Val Ala Gin Thr Ala i. » TTG TTT GTT ACA GAA TGG GGT ACA ATT 7B0 

Thr Ala Leu Asp Asn Asn v.i Ala Leu Phe Val Thr Clu Trp Gly Thr Xle 260 



wo 98/24799 



42/46 



PCT/US97/22623 



960 
320 



III r r =^ ^ ^ ACT AAT ACT TGG CCC TTT TTG e« 

«1 Leu As„ Th. Cly Cln Cly Giu p.^ Asp .,3 Clu S„ Thr Asn Thr Trp Mec Ala Phe Zu .bo 

2B1 ^J)! r r "° GAC AAA OCT TTT CCT GAA ACA SOO 

-1 ..s Glu cxy xu ser His Ala Asn Trp Ser .eu Ser Asp ..3 Ala P.e P.o o" Tr Z 

III T r CAA CCA GGA CAA COT CTA TCT GOT TTA ATT ACC AAT AAA CTT ACA GCC 

301 Gly ser V.l Val Gin Ala Gl, Gin Gl, Val Ser Oly Uu Ila Ser Asn Lys 2 1 

Gly Glu lie val Lys Asn lie He Gin Asn Trp Asp Thr Glu Thr Ser Thr Gly Pro U: 

^"l Z Tr 7r G? r r CAA ACA GCA CAA GCA aOBO 

Thr Thr Gin Cys Ser Thr He Glu Cys He Arg Ala Ala «et Glu Thr Ala Gin Ala 360 

Z x" 2 n T AAT TTT CAA GAC AAG ATA CAA GGT GCC Xl.O 

ly ASP Glu Xle lie lie Ala Pro Gly Asn Tyr Asn Phe Gin Asp Lys lie Gin Gly Ala 3B0 

^ z z t: r r '^^^ - - - 

Arg ser Val lyr Leu Tyr Gly Ser Ala Asn Gly Asn Ser Thr Asn Pro lie Xle .00 

'2 gT T T °" "'^ X«0 

Leu Arg Gly Glu Ser Ala Thr Asn Pro Pro Val Phe Ser Gly X,eu Asp Tyr Asn Asn Gly «0 



SO 
60 



T^: Z x" T T -G TTT AAA act GGG X3.C 

Leu ser Xle Glu Gly Asp Tyr Trp Asn lie Lys Asp lie Glu Phe Lys Thr Gly 440 

1321 AAA GGT ATT GTT CTT GAC AAT TCT AAT GGT AGT AAA TTA AAA AAC CTT GTT GTT CAT X3 

-X ser Lys Gly Xle Val .eu Asp Asn Ser Asn Gly Ser Lys Leu Ly. Asn Leu v" Z 2s 4 

"sl Z Z Ty 2 gT r r T - - 

Gly Glu Glu Ala Xle His Lau Arg Asp Gly Ser Ser Asn Asn Ser lie Asp Gly 480 

cys Thr Xle Tyr Asn Thr Gly Arg Thr Lys Pro Gly Phe Gly Glu Gly .eu Tyr Val Gly SCO 

ASP Lys Gly Gin His Asp Thr Tyr Glu Arg Ala Cys Asn Asn Asn Thr lie Glu Asn S20 

^»l Z T^ Z Ty IZ r 7: ? °" - - - - 1«0 

Gly Pro Asn Val Thr Ala Glu Gly Val Asp Val Lys Glu Gly Thr Met Asn 540 

Figure 17b (continued) 
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^'^^^i-:zzzzzzzzzzzzzz - 

-^^^^—zzzzzzzzzzzzz z 
-^^^zzzzzzzzzzzzzz— z 

^^^zzzzzzzzzzzzzzzzz z 

-^^^^zzzzzzzzzzzzzzz z 

z z z z z z z z zzz zzz z - - - - - 

^*ox «c TTT rcr aac gtt ttt gac tta gga tct ggc gga CCA rcr tta act AAT rr 

^ A^i TTA AGT AAT TTA AAA ACA 2460 

Figure 1 7a< continued) 
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"1 L„ „. „„ V.1 p.. ,1. ^ ^ 

K= "I to! t»c m T«, au, rrr ic» ,T. «AC AC , 

»= ^- n. ^ „. - ™ - - - 

OTA ACA.TCA GAT AAC COT AAT TTT OTO ATG OTA TCT AAA ACT AAT AAT TTT ACO ATA TAC „«a 

aax val t^ s.r Asp a,„ o^, ^„ p,. ^„ ^^^^ ^ - - 

ZZrT T °" ^'^^ 

Phe S« Asa Asp AX. Thr Ala Pro XXe Cy. A,„ v.l Thr Pro Ser Asn 01„ He Sex Uys S20 

rZ Z r 7 r - - -T TTA CAC CAA ACT .B.O 

ne Th. A,p ASP ser Ser He Asn Phe I.ys Leu Tyr Pro Asn Pro Ala Leu Asp Glu Thr 940 

7.1 7 r ""^ °" "° "° « »'0 

941 He Phe Val Ser Ala Olu Aep eiu Ly, Leu Ala Leu Val Leu Val Pro 956 



0 



Figure 17cl(contiaued) 
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Figure No. iQc. Pyrococcus furiosua VC1(7EG1) 
leader sequence: amino acids 1-24 

= ■ AT= .CC ^ ^ Z CTC ATC oil TCT .TC Z AC ATC CTA Jo 

Met se. .,3 ..s ..s PHe VaX Ue VaZ Se. xie .eu Th. Xle .eu J v" 2 

" " SO 99 

CCA ATA TAT TTT CTA CAA AA3 TAT CAT ACC TCT OAO OAC AAG TCA ACT TCa '^T 
Ala lie Tyr Phe Val Olu Lys Tyr His Thr Ser Olu Asp Lys Ser Thr sTr Z 

""^ 135 144 

ACC TCA.TCT ACA CCA CCC CAA ACA ACA CTT TCC ACT ACC AAG GTT CTC AAG Z 
Thr ser Ser Thr Pro Pro Gin Thr Thr .eu Ser Thr TKr Lys Val x,eu 

"° "9 198 216 

AGA TAC CCT GAT GAC GGT GAG TGG CCA GGA GCT CCT ATT GAT AAG GAT GGT GAT 
Arg Tyr Pro Asp Asp Gly Gl« Trp Pro Gly Ala Pro He Asp Lys Asp Gly Asp 



"4 243 252 261 



OGG AAC CCA GAA TTC TAC ATT GAA ATA AAC CXA TGG AAC ATT ^ AAT GCT A^ 
Cly Asn Pro Glu Phe Tyr Ue Glu He Asn X,eu Trp Asn He Leu Asn Ala Thr 

"7 306 

Gly Phe Ala Glu Met Thr Tyr Asn Leu Thr Ser Gly Val Leu His Tyr Val Gin 



"2 351 360 



CAA CTT GAC AAC ATT GTC TTG AGG GAT AGA AGT AAT XGG GTG Z GGA TAC CCC 
cm Leu ASP Asn Xle Val Leu Arg Asp Arg Ser Asn Trp Val His Gly Tyr Pro 



"6 40S 414 423 



g" r '^"^ ^^'^ -^^^ °'=^ i^ z 

Glu Xle Phe Tyr Gly Asn Lys Pro Trp Asn Ala Asn Tyr Ala Thr Asp Gly Pro 

ATA CCA TTA CCC AGT Z GTT TCA Z CTA ACA G^C TTC TAT Z ACA ATC TCC 
Xle Pro Leu Pro Ser Lys Val Ser Asn Leu Thr Asp P.e Tyr Leu rZ He Lr 
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<55 504 

S76 SBS 

TTA ACG AGA GAA GCr Trr nr^y. , 594 

512 621 

AIO ATA TG<= ATT TAC m =0A ITA o>A CCG C^^ CSC TCZ H ^ 

^^"^ 675 684 

z z z ™ ™ n r - 

11= lU V.1 A.„ oly nr p„ ,.1 „. 

" S «: r r r ™ - - ^ - - 1;^ .cc z 

•rp L„ M. A.. II, oly Trp ly, v.l »1. ph. Arg n. I,y. Th, pro II. 

765 774 -oi 

a" oT r ccc lie 

^ys aiu Oly T.. val Thr lie Pro Ty. aXy Ma P.e He Ser Val Ma Ma Z 



828 837 946 

ATT .OC TTA A.T TAC AOV O;. CTT T.C TT. 0.0 CAC lH ATT ^A 

ne se. s=. .eu Pro Asn T.r T.r OIu .eu X,eu CXu Asp Val aZu xie Ty 



873 



882 



ACT GAG TTT GGA ACG CCA AGC ACT IcC TCC GCC rr^ n.. 

- p.. p„ ... ... - z z: z z z u, ^ 

"■^ "6 
AAC ATA ACA CTA ACT CCT CTA GAT AGA CCT CTT ATT TCC TAA 3 • 
Asn He Thr Leu Thr Pro Leu Asp Arg Pro Leu He Ser * 



Figure IBb(contlnued) 
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